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Study guide 


You should schedule eighteen study sessions for this book. This includes time for 
working through Computer Book 2, answering the TMA questions and 
consolidating your work on Book 2. You should schedule six study sessions for 
each of Parts I, II and III. 


The sections vary in length. In Part I, Section 1 is shorter than average, and 
Section 4 is longer than average. In Part II, Sections 6 and 7 are both a little 
shorter than average, and Section 9 is longer than average. In Part III, Sections 12 
and 14 are a little longer than average, and Section 13 is shorter than average. 


As you study this book you will be asked to work through Computer Book 2. We 
recommend that you work through the chapters at the points indicated in the 
text. Your work on each part of the book will include two computer sessions. 
However, if you wish, you can postpone the first computer session in each part of 
the book and study it with the second at the end of your study of that part. 


A possible study pattern is as follows. 


Part I 


Study session 1: Section 1. 

Study session 2: Section 2. 

Study session 3: Section 3. You will need access to your computer for this session. 
Study session 4: Section 4. 

Study session 5: Section 5. You will need access to your computer for this session. 
Study session 6: TMA questions on Part I. 


Part Il 


Study session 7: Section 6. 

Study session 8: Section 7. 

Study session 9: Section 8. You will need access to your computer for this session. 
Study session 10: Subsections 9.1 and 9.2. 

Study session 11: Subsection 9.3 and Section 10. You will need access to your 
computer for this session. 

Study session 12: TMA questions on Part II. 


Part III 


Study session 13: Section 11. You will need access to your computer for this 
session. 

Study session 14: Section 12. 

Study session 15: Section 13. 

Study session 16: Section 14. 

Study session 17: Section 15. You will need access to your computer for this 
session. 


Study session 18: TMA questions on Part III and consolidation of your work on 
this book. 


Introduction 


A time series is a sequence of observations made over time. Data of this type are 
probably among the most commonly encountered: open any magazine or 
newspaper, watch any television news programme, and you are likely to be 
presented with a graph showing a time series — of house prices, performance 
indicators of one kind or another, share indices, or changes in voters’ intentions. 
Many automatic monitoring devices generate time series data in vast quantities, 
from air quality monitors to seismometers (which measure movements in the 
Earth’s crust). The availability of time series data has been greatly enhanced by 
the publication of official statistics on the internet. 








Some of the main modern statistical methods for analysing time series data are 
introduced in this book. There are three main reasons why special methods for 
analysing such data are required. First, temporal patterns — that is, patterns 
occurring over time, such as trends and seasonal variation — are often of interest, 
and special methods are needed to display and analyse these patterns. Some of 
these methods are discussed in Part I. Secondly, forecasts of future values of a 
time series are often required. Some commonly used forecasting methods are 
discussed in Part II. A third, more technical reason why special methods are 
needed is that observations taken at different time points cannot usually be 
assumed to be independent; for example, the ambient temperatures in a given 
location on successive days are likely to be positively correlated. Many standard 
statistical methods, including those that you have met so far in this course, apply 
only to independent observations, and a different approach is required to cope 
with correlated observations. A flexible modelling framework for this purpose is 
discussed in Part III. 





Time series are among the earliest data to have been collected in a systematic 
fashion. Nevertheless, statistical methods for the analysis of time series data were 
only formalized in the second half of the twentieth century. Today, time series 
modelling is a major area of statistics, which makes use of much elegant (and 
sometimes rather difficult) mathematics. In keeping with the aims of this course, 
this book will avoid the technicalities. Instead, it will concentrate on important 
concepts, with an emphasis on practical modelling using the statistical 

package SPSS. 








Part I Decomposition models 


Introduction to Part I 


Time series data can be characterized by different components, each of which may 
represent a feature of particular interest. In some situations, the main issue of 
interest is whether the data show a general upward or downward trend. In others, 
it is the variation within an annual cycle that is most relevant. Occasionally, the 
variation that remains after trends and seasonal fluctuations have been accounted 
for is the primary focus of an analysis. 


In Part I, you will learn how to identify, display, combine and estimate the 
components of a time series. In Section 1, time series are defined, and their 
typical features are described and illustrated with practical examples. In 

Section 2, a modelling framework for time series is introduced, and methods for 
choosing an appropriate model are described. In Section 3, you will learn how to 
use SPSS to enter, display and transform time series data. Methods for estimating 
the components of a time series are described in Section 4. Finally, in Section 5, 
you will learn how to use SPSS to analyse time series data using this 
component-based approach. 


1 Time series and their components 


In this section, time series are introduced and their features are described. In 
Subsection 1.1, graphical methods for displaying time series data are presented. 
In Subsection 1.2, the components of a time series are defined, and graphical 
methods are used to describe these components. 





1.1 Presenting time series data 


In its most general form, a time series is a collection of observations X; on some 
random variable X at different times t = t),to,.... Activity 1.1 provides an 
example of a typical time series. 


Activity 1.1 Visits overseas by UK residents 


The Office for National Statistics publishes a monthly series of the number of 
visits overseas by UK residents. The data are derived from the International 
Passenger Survey (IPS), a sample survey of around 250000 interviews carried out 
per year. 
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The data collected for each month from January 1980 to December 2004 are These data were obtained in 
presented in Table 1.1. Each entry in the table is the number of thousands of July 2005 from the National 
visits overseas by UK residents during a particular month and year. The month is Statistics website 

bak http://www.statistics.gov.uk. 
specified in the row at the top of the table, and the year is given in the column on 
the left-hand side of the table. 


Crown copyright material is 
reproduced with the permission 
of the Controller of HMSO. 


Table 1.1 Monthly series of number of visits overseas by UK residents 
(thousands) 


[on [ree [ee [a [| Jt Tane | sep [oet [ror [Doe 


1607 | 2116 
2074 | 2391 
2080 | 2133 


2535 | 2468 
2703 | 2317 
2912 | 2993 
2762 | 2968 
2994 | 3329 
3360 | 3486 
33295 | 3299 
3309 | 3911 
3818 | 4604 
A244 | 4605 





4568 | 4693 
4995 | 4589 
4848 | 4695 





4830 | 5401 
5244 | 5075 


1424 | 1014 
1465 | 1055 
1603 | 1098 
1695 | 1224 
1647 | 1398 
1884 | 1215 
1929 | 1386 
2075 | 1477 
2240 | 1767 
2388 | 2100 
2537 | 2362 
2975 | 2508 
3459 | 2719 
3661 | 2906 
3465 | 2802 
3747 | 3213 
3919 | 2935 
3908 | 3359 
4080 | 3494 
4260 | 3693 





(a) Look down the columns of Table 1.1. Comment briefly on the variation from 


year to year. 


(b) Now look across the rows. Identify the peak months for visits overseas by UK 


residents. 


In this course, time series in which the observations X; are made at 
equally-spaced time intervals are considered. hese time points may be labelled 
1,2,3,...,t,f+1,.... It will also be assumed that X+ is continuous, or that it 
can be treated as if it were continuous (as is the case in Activity 1.1). 





Time series data in tabular form can be used to identify the presence of marked 
trends over time and clear seasonal effects, as illustrated in Activity 1.1. However, 
it is much more difficult to identify subtle features, such as the broad shape of the 
trend, and the magnitude of the seasonal variation. For this reason, time series 
data are usually presented graphically rather than in tabular form. The most 
natural plot is the time plot, in which the observed values x; are plotted against 
time t. 
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Example 1.1 Time plot of visits overseas by UK residents 


The time plot for the data in Table 1.1 is shown in Figure 1.1. 
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Figure 1.1 Monthly visits overseas by UK residents 





To obtain this plot, the data in Table 1.1 have been reorganized into two columns: 
a column of time points (successive months between January 1980 and 

December 2004) and a column of observations on the numbers of visits. Notice 
that, even though the observations correspond to discrete time points (in this 
case, months), successive points are joined by straight lines. This enhances the 
display by creating an impression of change over time. The upward trend and the 
seasonal variation that you identified in Activity 1.1 are immediately apparent: 
the presence of an increasing trend is indicated by the overall upward drift in the 
time series; the seasonal variation produces the regular fluctuations that give the 
plot its sawtoothed appearance. 4 





Activity 1.2 Interpreting time plots 


Some further features of the data revealed by the time plot in Figure 1.1 are 
considered in this activity. 


Use the time plot to answer the following questions. 


(a) In your opinion, is the trend from year to year linear (that is, straight), or 
curved? 


(b) The extent of the seasonal variation can be measured by the size of the 
fluctuations, that is, by the difference between the maximum value and the 
minimum value within each year. In your opinion, is the size of these 
fluctuations constant, or does it vary over time? 
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Graphical methods play an important role in time series analysis, perhaps even 


more so than in other areas of statistics. Nevertheless, they have limitations. In 





particular, it is not always easy to identify trends and seasonality, or other 
features, from a time plot. This is illustrated in Example 1.2. 


Example 1.2 Central England temperatures, 1659-2004 


The time series of Central England temperatures gives monthly average surface 


air temperatures expressed in degrees Celsius (°C) for a region representative of 
the English Midlands. This time series is remarkable in that it dates back to 1659. 
The series is routinely updated by the Meteorological Office’s Hadley Centre for 








Climate Prediction and Research. 


For each year, the annual average temperature is obtained by calculating the 


mean of the twelve monthly values of the original series. A time plot of the annual 





average temperatures is shown in Figure 1.2. 
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Figure 1.2 Annual average temperatures (°C), 1659-2004 


Since the observations are annual averages, there is no seasonality in this series. 
However, it is of interest to examine whether there are any long-term trends, or 
whether there are any other patterns, such as cycles of years with high and low 





average temperatures. 


There is considerable variability from year to year. The annual average 
temperatures in the late seventeenth century were perhaps lower than at 


subsequent times. However, overall there is no obvious pattern. In particular, it is 





not clear whether or not there is an upward trend. In this instance, a visual 
examination of the time plot yields little definite information about long-term 
trends or other patterns such as cyclic variation. ¢@ 


Example 1.2 shows that methods other than graphs may be needed to reveal the 


underlying features of a time series. Much of this book is devoted to presenting 
appropriate tools with which to reveal such features. Nevertheless, examining a 


time plot is an essential first step in analysing a time series. Activity 1.3 will give 





you some practice at doing this. 
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These data were obtained from 
the Met Office website 

http: //www.met-office.gov.uk 
in February 2005. 


Section 1 Time series and their components 


Activity 1.3 Beer consumption 


Figure 1.3 shows the time plot of quarterly beer consumption in the UK, 
measured in thousands of hectolitres, between the first quarter (January to A hectolitre is one hundred 
March) of 1991 and the second quarter (April to June) of 2004. litres. 

These data are based on data 


obtained from the website 
http: //www.bized.ac.uk/timeweb 


Beer consumption in February 2005. 
18 000 
14 000 
10 000 
Ql Ql Ql 
1991 1995 1999 2003 
Quarter 


Figure 1.3 Quarterly beer consumption (in thousands of hectolitres), 
1991-2004 


Comment briefly on any patterns you notice in this time series. 


1.2 Components of time series 





In Subsection 1.1, time series were described informally in terms of trends, 
seasonality and cycles. These terms are defined more precisely in this subsection. 


A cycle is a regular pattern that repeats at fixed intervals. The time interval from Some authors also use the term 
the beginning of one cycle to the beginning of the next cycle is called the period cycle to describe long-term 
of the cycle. A cycle whose period is known to be determined by the natural clock fluctuations that repeat at 


. . int ls of ing length. I 
(for example, repeating day after day, or year after year) is said to be seasonal. ei Gl been sd " 


A seasonal cycle with period one year is said to be annual. For example, period. 


temperatures and other weather indicators, such as precipitation (which includes 
rain, sleet, snow and hail) or daily sunshine hours, display annual seasonality. 
Many social and economic time series, such as energy consumption levels and 
travel, also display annual seasonality, as do medical time series such as numbers 
of colds and numbers of deaths. 





In this book, most of the cycles considered are annual seasonal cycles, but it is 
important to remember that not all cycles are seasonal and not all seasonality is 
annual. An example of a seasonal cycle that is not annual is the circadian cycle, 
which affects many biological processes such as body temperature; this has a 
period of 24 hours. ‘There are many examples of time series with non-seasonal 
cycles. For instance, cases of many infectious diseases occur in regular cycles 
(known as epidemic cycles) of period longer than one year. Before vaccination was 
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introduced, the weekly counts of numbers of cases of measles and whooping cough 

in the UK followed cycles with periods of two years and four years, respectively, 

so these cycles are not seasonal. Sunspot activity provides another example of a 

time series with a non-seasonal cycle. Sunspot activity follows a cycle of period Sunspots are dark patches that 
between ten and eleven years. occur on the Sun. 








Another example of a time series with a non-seasonal cycle is given in Activity 1.4. 


Activity 1.4 Blood pressure 


The blood pressure (in mm Hg) of an individual was measured at intervals of two These data were obtained in 
milliseconds over a period of two seconds. The time plot of blood pressure January 2004 from the website 


measurements is shown in Figure 1.4. of the European Society for 
Hypertension Working Group 


on Blood Pressure and Heart 
Rate Variability 
http: //www.cbi.dongnocchi.it / 


Blood pressure 
glossary. 
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Figure 1.4 Blood pressure (in mm Hg) over time (milliseconds) 


(a) How many complete cycles do you observe? Briefly describe the shape of each 
cycle. 





(b) The period of a cycle can be estimated roughly by calculating the average 
time between successive peaks. Use the location of the high peaks in 
Figure 1.4 to estimate the period of the cycle. 


For the blood pressure data in Figure 1.4, the shape of the cyclical variation (for 
example, when the highs and lows occur) is easy to describe. This is not always 
the case. A different type of plot, called a seasonal plot, is often useful for 
displaying seasonality. In a seasonal plot for annual seasonality, a separate line is 
drawn for each year: for each year, the values x; of the time series are plotted 
against the time of year. This is illustrated in Example 1.3. 


I2 
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Example 1.3 Seasonal plots 
The time plot in Figure 1.1 for the time series of visits overseas is reproduced in 


Figure 1.5(a). 
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Figure 1.5 Monthly visits overseas: (a) time plot (b) seasonal plot 





The time series clearly has a regular pattern. However, it is not easy to describe 
this pattern from the time plot as the cycles are squashed together. A seasonal 
plot can help in this situation. 


The seasonal plot in Figure 1.5(b) shows the data for each year as a separate line. 
Owing to the upward trend in the data, the lines for later years tend to lie above 
those for earlier years, thus producing the layered effect in the diagram. However, 
all the lines have roughly the same shape, indicating that there is seasonal 
variation of period one year. The numbers of visits overseas are highest in the 
summer months, and lowest in the winter months. 4 





If a time series fluctuates around some fixed value, it is said to have a constant 
mean value or constant level. A time series is said to display a trend if there is a 
gradual change in the mean value or level of the series. Note the word ‘gradual’ 
here: short-term fluctuations or cycles with short periods do not represent trends. 
However, there is a potential diffculty with what is meant by ‘short-term’: how 
short is ‘short-term’? It is difficult to be precise about this. In this book, 
‘short-term’ is taken to mean short in relation to the length of the data series. 
The distinction between cycles and trend is illustrated in Figure 1.6 (overleaf). 
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Figure 1.6 Trends and cycles 


The time series in Figure 1.6(a) has a cycle but no trend, so it has a constant 
level: the time series repeats (with random variation) in successive periods. The 
time series in Figure 1.6(b) shows a trend, in this case an increasing one, but no 
cycle. Figure 1.6(c) shows a time series with both a cycle and a trend: the cyclical 
pattern repeats in successive periods, but at different levels. 








In addition to trends and cycles, time series also display apparently random 
fluctuations. These constitute the irregular component of the time series. The 
irregular component is the main feature of the time plot of annual average 
temperatures in Central England shown in Figure 1.2. 


The various components of a time series are illustrated in Example 1.4. 


Example 1.4 Monthly average house prices in the UK, 1991—2005 


Figure 1.7 shows the time series of monthly average house prices in the UK (in These data were obtained in 
pounds sterling) between January 1991 and January 2005. February 2005 from the website 
of the Nationwide Building 
Society 
www.nationwide.co.uk/hpi. 
Price (L) 
160 000 
120 000 
80 000 
40 000 


Jan Jan Jan Jan Jan Jan Jan Jan 
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Mouth 


Figure 1.7 Monthly average house prices (£) in the UK 
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The dominant feature of this time series is the upward trend after 1996, which 
continued until at least the middle of 2004. Prior to 1996, house prices appear 
stable. However, the trend since 1996 is so dominant that it obscures other 
features of the time series. 


Figure 1.8(a) shows the time plot for the first five years, January 1991 to 
December 1995, before the steep upward trend began. 
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Figure 1.8 Monthly average UK house prices, 1991-95: (a) time plot (b) seasonal plot 


Much more detail is visible in Figure 1.8(a) than in the time plot for the entire 
series in Figure 1.7: the scale on the vertical axis has changed, and as a result, 
small irregularities in Figure 1.7 appear as big peaks and troughs in Figure 1.8(a). 
The irregular component of the series is more apparent than it was in Figure 1.7, 
with clear month-to-month variation in prices. 


The seasonal plot for the first five years, which is shown in Figure 1.8(b), suggests 
that there is some seasonal variation: house prices tend to peak in May and June 
each year. However, the seasonal variation is not very marked, and appears to be 
absent in some years. 


The features evident from Figures 1.7 and 1.8 may be summarized as follows. The 
time series of UK house prices between 1991 and 2005 displays trend, seasonal 
and irregular components. The seasonal component is rather weak. The dominant 
component is a steep upward trend after 1996: house prices roughly trebled 
between 1996 and 2005. 4 





Summary of Section 1 





In this section, you have met several examples of time series. Graphical methods 
for presenting time series data have been presented, and the interpretation of time 
plots and seasonal plots has been discussed. The main components of time series 
have been defined, including trend, cyclic and irregular components. 
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Exercises on Section 1 


Exercise 1.1 Central England temperatures, 1951—2004 





The time plot for the time series of annual average temperatures in Central 
England from 1659 to 2004 was shown in Figure 1.2. Figure 1.9 shows the data 
for the last 54 years, that is, for 1951 to 2004. 


Temperature (°C) 
11 


10 
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Figure 1.9 Annual average temperatures (°C) in Central England, 
1951-2004 





(a) Explain why this time series has no seasonal component. 


(b) Briefly describe the trend component in this time series, and estimate the 
overall change in the annual average temperature over the 54-year period. 





(c) How important is the irregular component in this time series? Comment 
briefly on the extent of apparently random year-to-year variation in annual 
average temperatures, and on the impact that this has on visualizing the 
trend. 





Exercise 1.2 Seasonality of beer consumption 


A seasonal plot of the quarterly beer consumption data described in Activity 1.3 
is shown in Figure 1.10. 


Beer consumption 
18 000 


14 000 





10 000 


Quarter 


Figure 1.10 Seasonal plot of quarterly beer consumption (thousands of 
hectolitres) 


Is beer consumption in the UK seasonal? If you think it is seasonal, describe the 
seasonal variation. 
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2 A modelling framework for time series 





In Section 1, three components of time series were introduced: the trend 
component, the seasonal (or cyclic) component, and the irregular component. 
From now on, only series with annual seasonal cycles (or no cycles at all) are 
considered. In Subsection 2.1, a basic model for such time series is described. 
Methods for deciding whether this model is appropriate are discussed in 
Subsection 2.2. When the model is not appropriate, a transformation of the time 
series can sometimes be found that produces a time series for which the model 
may validly be used. Transformations of time series are discussed in 

Subsection 2.3. 





2.1 Models for time series 


The trend, seasonal and irregular components described in Section 1 were 
introduced to describe the main features of a time series. In fact, thinking of a 
time series in terms of its constituent parts is a fruitful approach, which can be 
used to develop models for time series. The idea is illustrated in Example 2.1. 


Example 2.1 Building up a time series 


A time series can be thought of as being built up from its trend, seasonal and 
irregular components. For example, consider the data in Table 2.1. 


Table 2.1 Components of a time series 


Time t and Trend Seasonal Irregular 

season component component component ‘Total 
1 Spring 100 +20 +6.4 126.4 
2 Summer 110 —35 —9.2 65.8 
3 Autumn 120 —15 +4.2 109.2 
4 Winter 130 +30 +12.6 172.6 
5 Spring 140 +20 —5.0 155.0 
6 Summer 150 —35 +5.6 120.6 
7 Autumn 160 —15 —6.0 139.0 
8 


Winter 170 +30 +2.2 202.2 


These data are artificial, but might plausibly represent sales of a particular 
product for which demand varies seasonally, being higher in the winter and spring 
than in the summer and autumn. 


The time t is listed in the first column, together with the season to which it 
corresponds. ‘hese data are quarterly. The trend component is in the second 
column. This describes how the level changes over time. In this example, the level 
increases linearly by 10 units each quarter. 





The third column contains the seasonal component. The values repeat every four 
quarters: spring sales are boosted by 20 units, summer and autumn sales are 
depleted by 35 and 15 respectively, and winter sales are boosted by 30. ‘Thus the 
values of the seasonal components for the four quarters, starting with spring, are 
+20, —35, —15 and +30. These values are called the seasonal factors. The 
seasonal factors of a time series sum to zero over one year: in this example, 

20 — 35 — 15 + 30 = 0. Note that the seasonal factors represent seasonal 
departures from the underlying level of sales. 


The fourth column contains the irregular component. This is assumed to vary 
randomly around zero, according to some distribution. Adding together the three 
components (trend, seasonal and irregular) gives the overall value z+, which is 
shown in the final column of Table 2.1. 
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This time series, with its various components, is shown in Figure 2.1. 


Value 
200 


100 





Time 


Figure 2.1 A time series built up from its components 





The trend component is represented by the upward-sloping line in the centre of 
the diagram. The seasonal factors are represented as vertical lines, indicating the 
departures from the trend. The irregular component is manifested by the vertical 
distances between the points in the time series and the tips of the vertical 

lines. 4 








The idea illustrated in Example 2.1 of building up a time series from its 
constituent parts can be described in general terms as follows. 


The trend component of a time series is denoted m;: this describes how the level The level and the trend 


of the time series varies with t. component are closely related: 
the level at time t is the value of 
The seasonal component is denoted s; and its period T. The values of the the trend component at time t. 





seasonal component repeat every T time points. Thus, if the time points t 
represent months, then T = 12 and 


St = S¢412 for all t. 

If the time points t represent quarters, then T = 4 and 
St = S¢44_ for all t. 

In general, for a seasonal component with period T, 


St = St+T for all t. 


Thus s+ takes only T different values, s1,59,...,57. These values are called the 
seasonal factors. These T seasonal factors represent departures from the trend, 
and sum to zero over a period: All the seasonal cycles analysed 
in this book are annual, so the 
sı +sg+---+s5r=0. seasonal factors sum to zero 


The sum of the trend component and the seasonal component gives the AUST 
systematic (non-random) part of the time series, which is denoted p,. Thus 


[Ly = Me + St. 
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At each time point t, x; may be regarded as an observation on some random 


variable X; with mean E(X;) = u. A simple model for X; is Upper-case letters X, Y, Z, W 
are used to represent random 
Xt = py, + Wi, variables, and lower-case letters 


E E l . . . 9 X,Y, Z,w are used to represent 
where W; = Xi — u is a random variable with mean zero and variance of. Note panicula valuas of thos 


that in this model it is assumed that X, has constant variance 07: V(X;) = 07 for Fandom variables. 
all t. The random variable W; corresponds to the irregular component of the time 
series. Thus the overall model is 


Xi = M + s + Wa. 


The main difference between this model and models in other areas of statistics is 
that, for different time points t, the random variables W; cannot generally be 
assumed to be independent. 





This model is called the additive decomposition model. It is a decomposition 
model because it is based on a decomposition of the time series into distinct 
components, and additive because the various components are added together. 
Note that this model can also be used to represent time series with constant level 
— that is, models for which m4 is constant, m4 = m for all t — and time series 
with no seasonal component, for which s; = 0 for all t. The model is summarized 
in the following box. 


The additive decomposition model 
The additive decomposition model for a time series X; is 
Xi = Mm + sit + Wa, gees E 


where m; is the trend component, s; is the seasonal component of 
period T, and W; is the irregular (or random) component. 


The seasonal component satisfies 

s~ s Tr oralt 

Sp ooo =p Sop = UL. 
The distinct values s,,...,57 are the seasonal factors. 
The irregular component W; has mean 0 and variance o°: 


BW) =U, Vii) =o. 


Activity 2.1 Seasonal factors 


One of the following sequences of numbers represents the first few values of the 
seasonal component for a quarterly additive decomposition time series model. The 
others do not. For each sequence, state whether or not it represents the seasonal 
component, giving a reason for your answer. 


(a) —2, +4, +8, —5, —3, +5, +4, -6,.... 
(b) +3, —1, —2, 0, +3, —1, —2, 0,.... 
(c) —4, +3, +8, —1, —4, +3, +3, —1, .... 
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2.2 Additive models and multiplicative models 


If the random variable X+ is necessarily positive, as is often the case, then an 
alternative way of combining the trend component m+, the seasonal component s+ 
and the irregular component W, is to multiply them together: 


APS he Xe X We 


This model is called the multiplicative decomposition model. The systematic 
component is u, = Msı and the irregular component W, is X;/,. The seasonal 
component of a multiplicative model is defined differently from that of the 
additive model. For the multiplicative model, 


St = Str for all f, 


S1 X S2 X X Sp Hl. 





Time series generated by additive models and multiplicative models often look 
quite different. This is illustrated in Figure 2.2. 
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Figure 2.2 Time series generated by decomposition models: (a) additive (b) multiplicative 


For the time series in Figure 2.2(a), which was generated using an additive model, 

the seasonal fluctuations do not vary in size systematically with the value of mų, The word ‘fluctuations’ is used 
the level of the time series at time t. Thus the seasonal fluctuations are of roughly here in a oon ee sense. 
the same size whether the level mą is large or small. Similarly, the irregular Lion Paha ia 
fluctuations are roughly of the same size, whatever the value of the systematic a pie oe eeeeae: 

component mz + St. 


In contrast, for the time series in Figure 2.2(b), the size of the seasonal 
fluctuations is proportional to m+, the level at time t: the larger the value of mų, 
the larger are the seasonal fluctuations. Note that the change in the size of the 
seasonal fluctuations is not due to a change in the seasonal component s+. It 
arises because m, and s+ are multiplied together. Similarly, the size of the 
irregular fluctuations is proportional to m x s+: the larger this is, the larger are 
the irregular fluctuations. 





These differences between the appearance of time series with an additive structure 
and time series with a multiplicative structure may be used to identify when an 
additive model is appropriate. If an additive model is not appropriate, then a 
multiplicative model may be, although it is possible that neither is appropriate. 
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Example 2.2 Is an additive model appropriate? 


Clostridium difficile is a bacterium that causes diarrhoea. It is of particular 

concern in hospitals. Figure 2.3(a) shows the time plot for the weekly number of These data and other data in 
reported cases of infection by Clostridium difficile in England and Wales between this book on infections in the 
the middle of 1996 and the middle of 2003. UK were provided by the Health 


Protection Agency Centre for 
Infections, London. 





Number of cases Number of cases 
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(a) (b) 


Figure 2.3 Weekly reports of infections, 1996-2003: (a) Clostridium difficile (b) Salmonella Typhimurium DT104 


There is an increasing trend (which is most likely due at least in part to 
improvements in the reporting of infections). There is also some evidence of 
seasonality, as shown by the peaks separated by a period of about 52 weeks. The 
magnitude of the seasonal fluctuations, and the size of the irregular fluctuations, 
do not appear to vary according to the level of the series: they are not markedly 
greater at the later end of the data series (when the average weekly number of 
reports is about 500) than at the beginning (when there are about 250 reports per 
week). This suggests that an additive model may be appropriate for this time 
series. 








Salmonella Typhimurium DT 104 is a bacterium that causes food poisoning. 
Figure 2.3(b) shows the time plot for the weekly number of reports of Salmonella 
Typhimurium DT104 between the middle of 1996 and the middle of 2003. In this 
case there is a broadly downward trend, from about 100 reports per week on 
average to about 10 per week. There is also marked seasonality, as shown by the 
peaks at intervals of about 52 weeks. However, in this series, the seasonal 
fluctuations appear to be greater in size in the earlier part of the series, when the 
weekly numbers of reports are larger, than in the later part of the series. The 
irregular week-to-week variation may also be greater during the earlier part of the 
series, though it is more difficult to be sure of this from the plot. Nevertheless, it 
is unlikely that an additive model will be appropriate for this series. On the other 
hand, a multiplicative model may be appropriate, though this is not 

guaranteed. 4 





As in many other areas of statistics, deciding whether or not a particular model is 
appropriate is as much an art as a science! Activity 2.2 will give you some practice 
in deciding whether or not an additive decomposition model is appropriate. 
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Activity 2.2 Travel and temperatures: choosing a model 


The time plots of two time series that were introduced in Section 1 are 
reproduced in Figure 2.4. 


Visits (thousands) Temperature (°C) 





8000 11 
10 
4000 
3 
0) 3 
Jan Jan Jan Jan Jan Jan 1950 1960 1970 1980 1990 2000 
1980 1985 1990 1995 2000 2005 Year 
Mouth 
(a) (b) 
Figure 2.4 (a) Visits overseas by UK residents (b) Annual average temperatures (°C) 


Figure 2.4(a) shows the (highly seasonal) time series of monthly visits overseas by 
UK residents. Figure 2.4(b) shows the time series of annual temperatures in 
Central England, 1951-2004; this time series is non-seasonal. 


For each of these time series, discuss whether or not an additive model is likely to 
be appropriate. 


2.3 Transforming time series 





Consider the multiplicative model X; = mų x s; x Wi. Let Y, denote the time 
series of logarithms: Y; = log X+. Then 


Yı = log X; 
= log (me X St X W;) 
= log m + log s; + log W. 
Thus the model for Y, is additive, with trend component log m;, seasonal 
component log s; and irregular component log W;. It follows that if a 
multiplicative model is appropriate for the time series X+, then an additive model 
is appropriate for the time series of logarithms, Y; = log X+. Thus, by taking 


logarithms, a time series for which a multiplicative model is appropriate can be 
transformed into a time series for which an additive model is appropriate. 


Example 2.3 Rotavirus infections 


Rotavirus infection causes diarrhoea and, in poor countries, is a major cause of 
infant and child mortality. The time series of weekly reports of rotavirus infection 
in England and Wales from the middle of 1996 to the middle of 1998 is shown in 
Figure 2.5. 
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These time plots were shown 
previously as Figures 1.1 
and 1.9. 


Natural logarithms (that is, to 
base e) are used throughout this 
book. 


Section 2 A modelling framework for time series 
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Figure 2.5 ‘Time series of weekly reports of rotavirus infection 


This time series does not display any clear upward or downward trend, but there 
is substantial seasonality. Week 14 of each year corresponds to early April, and 
hence the seasonal peak occurs in late winter and early spring. Since there is no 
trend, a decision as to whether or not an additive model might be appropriate 
must be based solely on the irregular component. 








The seasonal effect dominates in Figure 2.5. Nevertheless, it is apparent that the 
irregular fluctuations are less marked in the troughs than during the seasonal 
peaks: the big gashes at the tops of the peaks are much deeper than the little 
wiggles at the bottoms of the troughs. Thus an additive model is likely to be 
inappropriate for this time series. 


Is a multiplicative model appropriate? This can be investigated by transforming 
the time series using logarithms: each value x; is replaced by its logarithm. The 
time series of logarithms of weekly reports of rotavirus infection is shown in 
Figure 2.6. 
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Figure 2.6 Logarithms of weekly reports of rotavirus infection 


The transformed time series displays the same seasonal periodicity as the original 
time series shown in Figure 2.5. However, the irregular fluctuations are now 
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roughly of the same magnitude wherever they occur in the series: in particular, 
they are roughly of the same size in the troughs and at the peaks of the seasonal 
cycle. 


This suggests that an additive model is appropriate for the log transformed 
series. It follows that a multiplicative model is appropriate for the original, 
untransformed, rotavirus time series. @ 


Activity 2.3 Visits overseas: logarithms 


In Activity 2.2, you found that an additive model is not appropriate for the time 
series of monthly numbers of visits overseas by UK residents, because the size of 
the seasonal fluctuations increases with the level of the series. The time series of 


logarithms of numbers of visits overseas is shown in Figure 2.7. The logarithms are taken of the 
numbers in thousands. 
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Figure 2.7 Logarithms of numbers of visits overseas by UK residents 


(a) Is an additive model appropriate for the time series shown in Figure 2.7? 
Explain your reasoning. 


(b) What do you conclude about the validity of using either an additive model or 
a multiplicative model to represent the time series of monthly visits overseas? 


Activity 2.3 shows that some time series cannot be described adequately either by 
an additive model or by a multiplicative model. One way of overcoming this 
difficulty is to extend the class of models available. For example, consider the 
model 


Xi = mM X s + Wa. 


This model has a multiplicative seasonal component and an additive irregular 
component, so it is neither purely additive nor purely multiplicative. There are 
many other possibilities. 
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A simpler approach is to try to find a transformation other than the log 
transformation which, when applied to the time series, produces a series for which 
an additive model can validly be used. ‘This is the approach that will be used in 
this book. Transformations that are commonly used include the power 
transformations: 


— a = 1 
VSA, a=... J? 


N|—= 
N 
ww 


) 


w= 


It is important to emphasize that it is not always possible to find a 
transformation such that the transformed series may be represented adequately 
by an additive model. 





Example 2.4 Visits overseas: square roots 


Figure 2.8 shows the time series of square roots of the numbers of visits overseas The square roots are taken of 
by UK residents: y; = xy! 2 the numbers in thousands. 
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Figure 2.8 Square roots of numbers of visits overseas by UK residents 


The seasonal fluctuations are of roughly the same size whatever the level of the 
series. However, it is not possible to tell from Figure 2.8 whether this is also true 
of the irregular fluctuations. Thus it is not possible to conclude that an additive 
model is appropriate for this transformed series, only that it may be 
appropriate. ¢ 
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Activity 2.4 Transforming the salmonella time series 


In Figure 2.3(b), the time series of weekly reports of Salmonella Typhimurium 
DT104 was presented. The time series was discussed in Example 2.2, where it was 
concluded that an additive model is unlikely to be appropriate as the seasonal 
fluctuations vary with the level of the series. However, it was not clear whether 
the irregular fluctuations also vary with the level. Two transformations of this 
time series are shown in Figure 2.9. 
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Figure 2.9 Transformations of weekly reports of Salmonella Typhimurium DT104: (a) logarithm (b) square root 





(a) Describe the effect of the log transformation on the seasonal and irregular 
variation. 


(b) Describe the effect of the square root transformation on the seasonal and 
irregular variation. 





(c) In your opinion, which of the two time series in Figure 2.9 would it be more 
appropriate to describe by an additive model? 


Summary of Section 2 


In this section, the additive decomposition model and the multiplicative 
decomposition model have been introduced. You have learned how to use a time 
plot to decide whether or not an additive model is appropriate for a time series. 
You have seen that the log transformation transforms a multiplicative model into 
an additive model. The use of transformations other than the log transformation 
to produce a series for which an additive model may be appropriate has been 
discussed briefly. 
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Exercises on Section 2 


Exercise 2.1 Beer consumption 


The time series of quarterly beer consumption in the UK, 1991-2004, was 
introduced in Activity 1.3. The time plot of this series is shown in Figure 1.3. 


Discuss whether an additive model is appropriate for these data. 


Exercise 2.2 Company sales 


The time plot of monthly sales by a company between January 1965 and 
May 1971 is shown in Figure 2.10. (The company name and the units in which 


sales are measured have been concealed.) 
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Figure 2.10 Monthly sales figures, 1965-1971 


O’Donovan, T.M. (1983) Short 
Term Forecasting: An 


Introduction to the Box—Jenkins 


Approach. Wiley, Chichester. 


Jan 
1971 


(a) Is an additive model suitable for this time series? Explain your reasoning. 


Two transformations of the time series are shown in Figure 2.11. 
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Figure 2.11 Transformed sales figures: (a) logarithm (b) square root 


(b) Figure 2.11(a) shows the time plot of the logarithms of the monthly sales. 
The time plot of the square roots of the monthly sales is shown in 
Figure 2.11(b). In your opinion, which of the two time series in Figure 2.11 
would it be more appropriate to describe by an additive model?” 
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3 Displaying time series data in SPSS 


In this section, you will learn how to enter, display and transform time series data 


in SPSS. 


Refer to Chapter 1 of Computer Book 2 for the work in this section. ic 





Summary of Section 3 


In this section, you have learned how to enter time series data in SPSS and how 
to define dates. You have also learned how to obtain time plots, and how to 
transform time series. You have used these methods to investigate whether an 
additive or a multiplicative model might be appropriate for a time series. 


4 Estimating the components of a time series 


In this section, a method for estimating the components of a time series that can 
be described by an additive model is discussed. In Subsection 4.1, a technique for 
estimating the trend component of a non-seasonal time series is introduced. In 
Subsection 4.2, this technique is modified and applied to seasonal time series; this 
leads to a method for estimating the seasonal component of a seasonal time series. 
In Subsection 4.3, the techniques described in Subsections 4.1 and 4.2 are 
combined to give a method for decomposing a time series into seasonal, trend and 
irregular components. 


4.1 Estimating the trend component 


In this subsection, time series that can be described by an additive model with 
only a trend component and an irregular component are considered. This model, 
which does not have a seasonal component, is the non-seasonal additive 
model: 


Xi =m, + Wi. 


Sometimes the trend in a time series is clear from looking at a time plot. 
However, when this is not the case, in order to describe the trend, the trend 
component m+, must be estimated. 


The trend may be obscured by the fluctuations of the irregular component Wz, 
especially if the variation in m; is small compared to ø, the standard deviation 
of W,. In this context, the irregular component is sometimes referred to as noise. 
The larger the standard deviation of W;, the more the irregular component 
obscures the trend, and the noisier the series is said to be. 











One way to reduce the noise is to replace each value x; with the average of x; and 
its neighbouring values. This is illustrated in Example 4.1. 
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Example 4.1 Reducing noise 





The time series of annual average temperatures in Central England, 1659—2004, 
was shown in Figure 1.2. The time plot is reproduced in Figure 4.1(a). 
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Figure 4.1 Annual average temperatures (°C), Central England 


This is a very noisy series: the irregular year-to-year fluctuations are large and 
tend to obscure the underlying trend. To reduce the noise, the series was 
transformed as follows. For each year t, the value x; was replaced by the average 
of eleven values, corresponding to the temperatures in year t, in the five years 
preceding year t, and in the five years following year t. Thus, for example, the 
value for 1999 was replaced by the average of the values for years 1994 to 2004. 
This transformed series is shown in Figure 4.1(b). 








The time plot in Figure 4.1(b) is much less jagged than the time plot of the 
original series in Figure 4.1(a). So it is easier to identify underlying patterns from 
Figure 4.1(b) than from Figure 4.1(a). These patterns can now be summarized as 
follows. Temperatures were unusually low prior to about 1700, they fluctuated 
around an average value that remained roughly constant from 1700 to 1900, and 
they tended to increase after about 1900. ¢ 


The transformation described in Example 4.1 is called a simple moving 
average (or just a moving average). It can be written as 


Y; = 4 (Xis + + Xe t+ + Xis). (4.1) 


This transformation has the effect of reducing the standard deviation of the 
irregular component, hence producing a less jagged plot. This process of noise 
reduction is called smoothing. 


Since eleven values are used in the expression in (4.1), this transformation is said 
to be a moving average of order (or span) 11. For the purpose of smoothing time 
series, only moving averages for which the order is an odd number will be used. 
These are said to be centred on the middle value. 
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The definition of a simple moving average of arbitrary odd order is given in the 
following box. 


Simple moving average 


The simple moving average of order 2q + 1 centred on t is given by the 
transformation 
1 


Example 4.2 A moving average of order 5 


Table 4.1 Five values of the 
Central England temperatures 
time series 


The first five values of the Central England temperatures time series are 
reproduced in ‘Table 4.1. 


For a simple moving average of order 5, g = 2, so the moving average value for the Average 
year 1661 is Year temperature 
Y1661 = =(21661-2 + £1661—1 + £1661 + X1661+1 + £1661+2) me sa 
= £(8.83 + 9.08 + 9.75 + 9.50 + 8.58) 1661 9.75 
Z 1662 9.50 
rile ii 1663 8.58 


Activity 4.1 Calculating a simple moving average 


(a) Use a simple moving average of order 3, and the data in Table 4.1, to 
calculate moving average values for the years 1660, 1661 and 1662. 


(b) Explain why the values of a simple moving average of order 3 cannot be 
calculated for the years 1659 and 1663 from the data in Table 4.1. 


(c) For which years can a moving average of order 5 be calculated from the data 
in Table 4.1? 


In Activity 4.1, you found that, for a moving average of order 3, the values 
corresponding to the first time point and the last time point cannot be calculated. 
In general, for a moving average of order 2q + 1, the values for the first q and the 
last q time points cannot be calculated. In practice, this is not a problem as the 
order of the moving average is usually much shorter than the time series. 


The smoothing effect of moving averages of different orders is illustrated in 
Example 4.3. 
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Example 4.3 Temperature of a chemical process 


Figure 4.2 shows the time series of temperatures, in degrees Fahrenheit, of a 


chemical process at intervals of two minutes. 
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Figure 4.2 Temperature (°F) of a chemical process over time (minutes) 


Moving averages of orders 3 and 21 are used to smooth this time series. The 


results are shown in Figure 4.3. 
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Figure 4.3 Moving averages: (a) order 3 (b) order 21 
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Note that the vertical scale used in Figures 4.3(a) and 4.3(b) is the same as the 
one used for the unsmoothed series in Figure 4.2. Using the same scale for the 
moving averages as for the original time series is essential in order to assess the 


degree of smoothing. 


Compare Figure 4.3(a) with Figure 4.2. Notice that the moving average of order 3 
has reduced the size of the fluctuations. However, substantial noise remains in the 
time series, and the underlying trend is still not very clear. The moving average of 
order 21 in Figure 4.3(b) has smoothed the series much more drastically. The 
remaining fluctuations are much smaller, and the overall trend is now much 
clearer. The trend can be summarized as follows: the mean temperatures dropped 
over the first hour, then rose for about three-quarters of an hour, before dropping 


again. ¢ 
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The higher the order of the moving average that is used, the greater is the 
smoothing effect produced. With a suitable degree of smoothing — that is, with a 
suitable choice of the order of the moving average — the moving average provides 
an estimate of the trend component m+; this is denoted My. Thus 





—m~ 


Mt (Dez att ep Oy ae ees Wake) 


= 2q+1 
Choosing the right amount of smoothing — that is, choosing an appropriate value 
for the order 2q + 1 of the moving average — is not a simple task, and is to a large 
extent a matter of judgement. If the order of a moving average is too low, then 
not enough of the noise will be removed to reveal the underlying trend clearly; 
this is described as under-smoothing the data. On the other hand, if the order 
of a moving average is too high, there is a risk that variations of interest in the 
trend itself will be ironed out. This is described as over-smoothing the data. 


To see how under-smoothing and over-smoothing arise, consider again the 
non-seasonal additive model 


Xi = Mı + Wi. 





When the observed values of X; are averaged, both the trend component m and 
the irregular component W, are averaged. Averaging the values of W; reduces the 
standard deviation of the irregular term, and hence reduces the noise; this result 
of smoothing is desirable. However, by averaging the trend component, some 
important detail of the trend may be lost, and this would be undesirable. 
Under-smoothing arises when the noise is not sufficiently reduced to reveal the 
trend; over-smoothing arises when important detail in the trend is smoothed out. 





The aim is to smooth by just the right amount, so as to eliminate most of the 
noise while leaving the trend largely unaffected. Unfortunately, a general rule for 
achieving this is not readily available. It is usually a good idea to try moving 
averages of several different orders before attempting to summarize an underlying 
trend. Activity 4.2 will give you some practice at choosing how much to smooth a 
time series. 


Activity 4.2 Choosing the order of a moving average 


Figure 4.4 shows 197 successive readings of the concentration level of a chemical 


process, taken at two-hourly intervals. Box, G.E.P. and Jenkins, G.M. 
(1976) Time Series Analysis: 
Forecasting and Control. 
Holden-Day, San Francisco. 
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Figure 4.4 The concentration level of a chemical process 


The time series is non-seasonal, but very noisy. 
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Moving averages of orders 51, 25, 11 and 3 were used to smooth the series. The 
results are shown in Figure 4.5. 


Moving average 
18 


I7 


16 
0) 200 400 


Time (hours) 
(a) 
Moving average 
18 


I 


16 
0) 200 400 


Time (hours) 
(c) 


Moving average 
18 


i 


16 
0) 200 400 


Time (hours) 
(b) 
Moving average 
18 


17 


16 
0) 200 400 


Time (hours) 
(d) 


Figure 4.5 The concentration level data smoothed using moving averages 


(a) Which moving average was used to produce the smoothed series in each of 
the four time plots shown in Figure 4.5? 





(b) In your view, which of the moving averages result in under-smoothing of the 
time series? Which produce over-smoothing? Which smooth the series by 
about the right amount? 


4.2 Estimating the seasonal component 








In Subsection 4.1, simple (centred) moving averages were used to estimate the 
trend component of a non-seasonal time series. In this subsection, moving 
averages will be used to estimate the seasonal component of a seasonal time series. 


Consider a seasonal time series X; that can be described using the additive model 
Xt =m, + st + Wi. 


The first step is to find an initial estimate of the trend component m, that is not 
unduly influenced by the seasonal component s;. A reasonable starting point 
would be to use a simple moving average with order equal to the period T of the 
seasonal cycle. Such a moving average would smooth out the seasonal variation, 
as the annual highs and lows would cancel out. 


The notation X; is used both for 
a time series and for the random 
variable representing the 
element of the series at time t. 
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For example, suppose that the data are quarterly, so that T = 4, and a smoothed 
value is required for summer 2006. The period is an even number, but to centre 
the moving average on summer 2006, an odd order is required. A simple moving 
average of order 5 (say) would average over winter, spring, summer and 

autumn 2006 and winter 2007: thus winter would be counted twice. ‘This problem 
can be overcome by using a different type of moving average, called a weighted 
moving average. In this example, if the two winter values are given half the 
weight of the other values, then each season is equally weighted. In general, for a 
quarterly time series X+, the following transformation is used: 





SA(t) = 1(0.5X+—2 + Xit—1 + Xi + Xe41 + 0.5X442). (4.3) The S in the notation SA(t) is 
for S l. 
Similarly, if the series X; is monthly, so that T = 12, the following transformation cian 
is used: 
SA(t) = 4(0.5X¢-6 + Xi-5 + + Xi He + Xis + 0.5 X46). (4.4) 


This ensures that each month is equally weighted. 


Since the seasonal factors add up to zero over a seasonal cycle, when applied to an 
additive time series with seasonal component s;, the transformations S A(t) 

in (4.3) and (4.4) give the same result as they would if there were no seasonal 
component. Thus these transformations remove the seasonal effect. This is 
illustrated in Example 4.4. 


Example 4.4 Visits overseas: smoothing out the seasonal component 
In Example 2.4, you saw that an additive model may be appropriate for the 


monthly time series of square roots of numbers of visits overseas by UK residents. 
The time plot for this time series is reproduced in Figure 4.6(a). 
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Figure 4.6 Square roots of numbers of visits overseas: (a) original data (b) after applying the moving average 


The time series displays very marked seasonality. Figure 4.6(b) shows the time 
series obtained by applying the transformation (4.4) to this time series. The 
seasonal variation has been smoothed out of the series, leaving the (smoothed) 
underlying trend and some irregular variation from month to month. 








Since the values s; repeat every twelve time points, and s1 + S2 +---+ S12 = Q, 
for each t, 


0.55+_¢6 oP 65 = 8 n ee ee St+5 + 0.55146 = (), 


Hence the transformation is effective in removing the seasonal component. 4 
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The transformations (4.3) and (4.4) are examples of weighted moving averages. 
The general weighted moving average of odd order is defined in the following box. 


Weighted moving average 


A weighted moving average of order 2q + 1 has the form 


M A(t) = a_gXt¢_-qg + °° $_-1X¢_-1 + aXe + a1 Xt+1 +++ + ag Xt4q; The weighted moving averages 
; l in (4.3) and (4.4) are denoted 
where the weights a 7 — —g, uUe a add up to I. SA(t) rather than M A(t) 


because of their particular use 
in smoothing out seasonal 


The simple moving average (4.2), which was introduced in Subsection 4.1, is a variation: the 5 is for Seasonal. 


weighted moving average in which the weights a; are all equal to (2g+1)~'. Just 
as simple moving averages can be used to smooth time series, so can weighted 
moving averages. 


Activity 4.3 Weighted moving averages 


(a) Give two reasons why the following transformation is not a weighted moving 
average: 


zt = O.lay_2 + 0.3241 +0.12? + 0.32741 + 0.12442. 


(b) Suppose that bimonthly time series data are available, with seasonal period 
T = 6. Write down an appropriate weighted moving average of order 7 for 
smoothing out the seasonal variation. 


In Example 4.4, you saw that the weighted moving average S A(t) with 
appropriate order smooths out the seasonality from a seasonal time series 

Xi = m4 + 5; + Wa: the result is the same as if the transformation had been 
applied to the series m; + W;. In Subsection 4.1, you saw that a simple centred 
moving average provides an estimate of the trend component of a non-seasonal 
time series. Thus the weighted moving average S'A(t) provides an estimate of the 
trend component m, of the seasonal time series X;: 


Pu = SA(t). 





Now consider the differences between the original series X; = mą + s+ + W and 
SA(t) = Mu. These differences will be denoted Y;. Since Y, = X; — S A(t), we have 
Y; = me + Se +Wi — mM 
= s + Wi + (m: — mM) 
= st +W}, say. 
The term W; = Wi + (Mmi — Mua) is a new irregular component. Thus the time 


series Y, has no trend component: it consists of the seasonal component of the 
original series X; and some irregular variation. 
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Example 4.5 Visits overseas: removing the trend component 


In Example 4.4, the seasonal component was smoothed out of the monthly time 
series x, of (square roots of) numbers of visits overseas by UK citizens by 
applying the weighted moving average S A(t). The series of differences 

yz = x, — SA(t) is shown in Figure 4.7. 
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Figure 4.7 Visits overseas data: the differences y = 7; — S A(t) 


The trend has been removed, leaving the series 
Ye = St + W}. 


The irregular component is apparent from the slight variation in the width of the 
seasonal fluctuations. In order to estimate the seasonal component of the original 
time series, this remaining irregular variation must be removed. @ 


In general, the irregular variation W; may be removed as follows. For each season, 
all the values of Y; corresponding to that season are gathered together, and their 

average is calculated. ‘The averages are called the raw seasonal factors and are 
denoted Fj, 7 =1,2,...,7. 





For example, for a monthly series, FJanuary, the raw seasonal factor for January, is 
given by 


sum of the values y; for January 


T January — 
. number of January values 


The corresponding values for February, March, and so on, are obtained in a 
similar way. Similarly, for a quarterly series, for each quarter, all the values of Y; 
for that quarter are averaged to give the raw seasonal factor for the quarter. 





The seasonal factors s1,...,57 sum to zero over one seasonal cycle, so to estimate 
the seasonal factors, the average of the F} is subtracted from each raw seasonal 
factor. For example, for a monthly series, the seasonal factor for January is 
estimated by 





SJanuary — F January = ie 


where F = (F January +++: + F December) /12. The seasonal factors for the other 
seasons are estimated in a similar way. This method is illustrated in Example 4.6. 
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Example 4.6 Visits overseas: estimating the seasonal component 


For the monthly time series of (square roots of) numbers of visits overseas, there 
were 24 values of Y; for each month. The values of Fj for these data are given in 
Table 4.2. 


Table 4.2 Estimating the seasonal factors 


Month J F; Sj 

January —9.5809 —9.576 
February —10.2581 —10.254 
March —5.4079 —5.403 
April —0.6274 —0.623 
May 0.4188 0.423 
June 6.5996 6.604 
July 7.7319 7.130 
August 15.2347 15.239 
September 10.5442 10.549 
October 4.8317 4.836 
November — 7.0544 —7.050 


December —12.4856 —12.481 





The average of the twelve values Fj corresponding to the months January to 
December is —0.00445, so F = —0.00445, and hence the estimated seasonal 
factors are given by s; = F; + 0.00445. The estimated seasonal factors s; rounded 
to three decimal places are also shown in Table 4.2. The estimated seasonal 
component of the time series comprises the estimated seasonal factors repeated 
over successive periods. This is shown in Figure 4.8. 
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Figure 4.8 The estimated seasonal component for the visits overseas time 
series 


Note how the series in Figure 4.8 differs from that in Figure 4.7: the irregular 
variation has been removed, and the estimated seasonal component now repeats 
exactly every twelve months, as required. 


The values s; in Table 4.2 represent the estimated average seasonal departures 
from the underlying trend. The largest value is positive and corresponds to the 
month of August; the smallest value is negative and corresponds to December. 
These estimated seasonal factors show that the series peaks in August and is 
lowest in December. ¢ 
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In this book, you will not be required to estimate the components of a seasonal 
time series by hand: although the calculations are not difficult, they are tedious, 
so they are best done by a computer. However, you will be required to interpret 
plots corresponding to the various stages of a decomposition of a seasonal time 
series into components, and to interpret estimated trends, raw seasonal factors 
and estimated seasonal factors. The stages involved in estimating the seasonal 
factors are summarized in the following box. 


Estimating the seasonal component of an additive model 


For a seasonal time series X;, which may be described by an additive model, 
and for which the seasonal period is T (an even number), the seasonal 
component s; may be estimated as follows. 


© First, the series is smoothed using a suitable weighted moving average. 
For example, if T = 12, 


SA(t) = E (0.5£4-6 + £t-5 + + ti + +05 +056); 
or wien = A, 
SA(t) = t (0.5242 + 0-1 + Tt + Tipi + 0.52442). 
© Next, the series of differences y, = x; — S A(t) is obtained. 





© The raw seasonal factors F}, 7 = 1,...,T, are calculated as follows: 
for season j, Fj is the average of the values of y corresponding to that 
season. 





© The average F of the raw seasonal factors F; is obtained. 


© The seasonal factors s1,59,...,57 are estimated by 


Cet ee eye el Ayre 


Activity 4.4 will give you some practice at describing and interpreting the results 
at various stages in the estimation of the seasonal component of a time series. 


Activity 4.4 Beer consumption 


In Activity 1.3, the time series of quarterly beer consumption in the UK between 


the first quarter of 1991 and the second quarter of 2004 was introduced. The time 


plot shown in Figure 1.3 is reproduced in Figure 4.9(a). 
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Figure 4.9 Quarterly beer consumption (in thousands of hectolitres), 1991-2004 
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This time series can be described adequately by an additive model, so the See Exercise 2.1. 
decomposition method described in this subsection can be applied. Figure 4.9(b) 

shows the series after smoothing with the weighted moving average S A(t) of 

order 5. The raw seasonal factors F} are given in Table 4.3. mhed? Therwsssonal 


(a) Briefly describe the trend shown in Figure 4.9(b). factors Fy 


(b) In two sentences, explain how the raw seasonal factors Fj are obtained from Quarter j By 


the series shown in Figure 4.9(a) and Figure 4.9(b). 1 January-March —2949.41 
, , , ; 2 April-June 500.18 
(c) Use the raw seasonal factors Fj, which are given in Table 4.3, to obtain 3 July-September 647.19 


estimates Sj of the seasonal factors. 4 October-December 1760.50 














(d) Interpret the estimated seasonal factors. 


4.3 Seasonally adjusted series 


In Subsection 4.1, you learned how to estimate the trend component for a 
non-seasonal time series; and in Subsection 4.2, you learned how to estimate the 
seasonal component for a seasonal time series. In this subsection, these methods 
are combined to estimate the seasonal, trend and irregular components of a 
seasonal time series. 


Suppose that a time series may be described by the seasonal additive 
decomposition model 


Xt = Mi + si + Wi. 


The first stage in breaking down the time series into components is to estimate 
the seasonal factors (as described in Subsection 4.2). This produces an estimate 3; 
of the seasonal component. This estimate is then used to obtain the seasonally 
adjusted series, which is denoted Z;: 


Zt = Xi — St 
=m: + Wi + (st — St) 
The term W; = W, + (s+ — S+) is a new irregular component. 


The seasonally adjusted series Z; has no seasonal component, only a trend 

component and an irregular component. Since the seasonal effect has been 

removed, the seasonally adjusted series can be used to make direct comparisons 

between levels in adjacent time periods. Many economic series (such as those for 

some indices of inflation and economic activity) are seasonally adjusted in order The seasonal adjustment 


to reveal underlying trends. method used for economic time 
series is usually more 


The seasonally adjusted time series can be analysed using the methods of complicated than that described 
Subsection 4.1: for example, an estimate Muy of the trend component m, can be here, but the principle is the 
obtained by smoothing the seasonally adjusted series Z;. If required, the irregular 5®™®®.- 

component of the original series can be estimated by subtracting M+, the estimate 

of m4, from Z+, the seasonally adjusted series: 


W: — Zt — Ma. 





This completes the decomposition of the original series into components: 


Xi = Ru ++ W,. 
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Example 4.7 Visits overseas: the seasonally adjusted series 


The seasonal factors of the time series x; of (square roots of) numbers of visits 
overseas by UK residents were estimated in Example 4.6. The seasonally adjusted 
series z% = £4 — 5; is shown in Figure 4.10(a). 
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Figure 4.10 Visits overseas: (a) seasonally adjusted series (b) estimated trend component 


The seasonally adjusted series z; retains a trend component and an irregular 
component, but the seasonal component of the original time series has been 
removed. The estimated trend component M+ is shown in Figure 4.10(b); this was 
obtained by smoothing z; using a simple moving average of order 21. 


The irregular component, which is obtained by subtracting the estimated trend 
component in Figure 4.10(b) from the seasonally adjusted series in Figure 4.10(a), 
is Shown in Figure 4.11. 
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Figure 4.11 Visits overseas: irregular component 
No trend or seasonality are apparent in the irregular component. All three 


components of the time series have now been estimated. Adding them together 
would reproduce the original series exactly. 4 
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Activity 4.5 Seasonally adjusted beer consumption 


In Activity 4.4, you estimated the seasonal component of the time series for 
quarterly beer consumption in the UK. ‘The seasonally adjusted series is shown in 
Figure 4.12. 
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Figure 4.12 Beer consumption, seasonally adjusted series 


(a) Briefly describe this series. 


Two estimates of the trend component are shown in Figure 4.13; these are plotted 
using the same scale as that used in Figure 4.12. 
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Figure 4.13 Trend component: (a) moving average of order 3 (b) moving average of order 9 


(b) Figure 4.13(a) was obtained using a simple moving average of order 3; 
Figure 4.13(b) was obtained using a simple moving average of order 9. In 
your opinion, which of the two moving averages results in the better estimate 
of the trend? Justify your answer. 








(c) Summarize the underlying trend in beer consumption. 
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Summary of Section 4 


In this section, you have learned how to estimate the seasonal, trend and irregular 
components of a time series that can be described by an additive decomposition 
model. Simple moving averages have been used to estimate the trend component 
of a non-seasonal time series. Under-smoothing and over-smoothing have been 
discussed. You have learned how to estimate the seasonal component of a time 
series using weighted moving averages and how to obtain a seasonally adjusted 
series. Finally, you have seen how the method for estimating the trend component 
of a non-seasonal series can be applied to the seasonally adjusted series to obtain 
estimates of the trend component and the irregular component of the original 
time series. 


Exercises on Section 4 


Exercise 4.1 The FTSE100 index 


The FTSE100 is an index based on the share prices of 100 leading companies, 
known as the ‘footsie’ index. It is one of several such indices used to measure the 
overall performance of the London stock market. The seasonal variation in the 
time series of monthly averages is small and may be ignored. 


Figure 4.14 shows the time plot of logarithms of the value of the FTSE100 index 
at the close of trade on the last day of each month between January 1988 and 
January 2005. 
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Figure 4.14 Logarithms of the FTSE100 index, 1988-2005 


(a) The original data were transformed using logarithms so that an additive 
model would be appropriate. In your view, has the transformation achieved 
the desired result? Explain your answer. 
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The three time plots in Figure 4.15 were obtained by smoothing the time series in 
Figure 4.14, using simple moving averages of orders 3, 11 and 19. 
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Figure 4.15 ‘Three trend estimates for the log FTSE100 index 


(b) Which moving average was used to produce each of the time plots? 


(c) Discuss which of the three smoothed series in Figure 4.15, if any, gives a good 
estimate of the trend. If you do not think that any of them provides a good 
estimate, then suggest another moving average, and explain your choice. 


Exercise 4.2 Temperatures in Recife 


Figure 4.16 contains time plots of the monthly average air temperatures (in 





degrees Celsius) in Recife, Brazil, from January 1953 to December 1962, and of 


the seasonally adjusted series. 
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Chatfield, C. (2004) The 
Analysis of Time Series: An 
Introduction. Sixth Edition. 
Chapman & Hall/CRC, London. 
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Figure 4.16 Temperatures in Recife: (a) original data (b) seasonally adjusted series 


(a) Briefly describe the two series. 
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(b) The estimated seasonal factors are given in Table 4.4. 


Table 4.4 Estimated seasonal 
factors, 5; 


Month j Sy 

January 1.075 
February 1.315 
March 0.979 
April 0.621 
May —0.158 
June — 1.035 
July —1.788 
August —1.800 
September —0.779 
October 0.054 
November 0.530 


December 0.991 


Explain briefly how these were obtained from the series shown in 
Figure 4.16(a), and interpret them. Which is the hottest month in Recife? 
Which is the coldest month? 


Two time plots of the trend component, obtained by smoothing the seasonally 
adjusted series in Figure 4.16(b), are shown in Figure 4.17. 
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Figure 4.17 ‘Temperatures in Recife: smoothed seasonally adjusted series 


(c) Compare the smoothness of the two time series. (Hint: look closely at how 
the two plots are drawn.) 








(d) Briefly summarize the trend in average temperatures in Recife between 1953 
and 1962. 
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5 Analysing time series in SPSS 


In this section, you will learn how to use SPSS to calculate moving averages and 
to estimate the components of a seasonal time series that can be described by an 
additive model. 


Refer to Chapters 2 and 3 of Computer Book 2 for the work in this 
section. 


Summary of Section 5 


In this section, you have learned how to calculate simple moving averages in 
SPSS, and how to use them to smooth time series. You have also learned how to 
use SPSS to estimate the seasonal factors of a time series and obtain the 
seasonally adjusted series. You have used these methods to analyse time series 
data using the decomposition methods described in Part I. 
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Introduction to Part II 


In Part I, you learned how to use time series data to estimate the components of a 
time series. The focus of this type of analysis is to describe the past features of a 
time series. Another important use of time series is to use past values to predict 
future values, a process known as forecasting. 


Forecasting is central to most human activities that involve some degree of 
forward planning. Weather forecasts are perhaps the most obvious example. Our 
reliance on pensions, insurance and mortgages also testifies to the importance we 
place on planning for the future. Good forecasts, and a realistic assessment of the 
likely reliability of forecasts, can help improve such plans. 


Making accurate forecasts is not easy, since forecasting necessarily involves 
extrapolating beyond the range of the available data: all forecasts are based on 
the untestable assumption that some aspect of the process that generated the 
data in the past will also hold for the future. Nevertheless, the quality of forecasts 
can be improved by using good extrapolation techniques. Some of these 
techniques are described in Part II. 





In Sections 6 and 7, a class of simple yet powerful forecasting methods known 
collectively as exponential smoothing is described. These methods are commonly 
used in many fields of application. In Section 8, you will learn how to apply these 
methods in SPSS. In Section 9, several techniques for evaluating the performance 
of these forecasting methods are discussed. You will learn how to use SPSS to 
apply these techniques in Section 10. 
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What will the weather be like next month? What are the chances of an influenza 
pandemic next year? By how much will the economy grow over the next five 
years? To answer questions such as these, forecasters have developed complicated 
models that mimic the underlying processes influencing each outcome, whether 
they are weather patterns, the spread of infections or economic performance. 
Complex models are generally required when soundly-based long-term forecasts 
are needed. 





However, in many cases, useful short-term forecasts can be obtained by much 
simpler methods, based on extrapolating historical data. In this section, one such 
approach, known as exponential smoothing, is introduced. In Subsection 6.1, 
the simplest version of the method is described. This involves a single parameter, 
whose value has to be chosen by the investigator. Choosing the value of this 
parameter is the topic of Subsection 6.2. 


6.1 Simple exponential smoothing 


Example 6.1 will help to set the scene for the exponential smoothing method. 


Example 6.1 Predicting tomorrow’s temperature 


Suppose that it is the evening of 14 August and that today’s average temperature 
was 19.4°C. What will the (average) temperature be tomorrow? 











One way to predict tomorrow’s average temperature is to assume it will be the 

same as today’s. But was today’s temperature typical? A time plot of the average 

temperatures recorded on recent days should help to answer this question. 

Figure 6.1 shows the daily average temperatures between 15 July and 14 August. In fact, these are London 











temperatures for 15 July to 


14 August 2004. 
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Figure 6.1 Daily average temperatures, 15 July—14 August 








Yesterday the average temperature was 17.8°C, and the day before yesterday it 
was 18.9°C. So perhaps today was unusually hot. A better prediction for 
tomorrow might be achieved by taking an average of the last three days’ values, 
for example. 
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However, as you can see from Figure 6.1, the average temperature on each of the 
last three days was lower than that on any of the previous fifteen days, when the 
daily average temperature was never below 20.0°C. How should this information 
affect our prediction for tomorrow? 





Clearly, the further back in time we go, the less influence past values should have 
on our prediction for tomorrow’s average temperature; and if we go back too far, 
seasonal effects would also come into play. But it seems reasonable to predict 
tomorrow’s average temperature by combining past values in some way — for 
example, by a weighted average in which more weight is given to more recent 
values. ¢ 








Suppose that the current time point is t = n, and that observations on a time 
series X; are available up to time n. Suppose that the time series X; can be 
described by an additive model with constant level m and no seasonality. ‘That is, 
suppose that 


Xi = m + Wi, 
where W; is the irregular component. 


The aim is to forecast Xn+1, given observations %n,%p_1,%n_2,... . This forecast 
is called a 1-step ahead forecast of Xn+1, and is denoted 7,41. The hat 
denotes that it is an estimate, and the expression ‘l-step ahead’ emphasizes that 
the forecast is based on historical data up to and including time n, so that only 
the next value is being forecasted. 


As suggested in Example 6.1, a reasonable approach is to use a combination of the 
present value and past values. There are many possible ways of doing this. ‘The 
particular combination used in exponential smoothing is 


Legit = Cola. PCL yet T Oln- T Eaa es 
where 


c;=a(l—a)', i=0,1,2,..., 





and a is a parameter whose value is to be specified, 0 < a < 1. That is, 


Ent1 = An +a(1—a)tn_-1 +.a(1—a)*tp_2 +a(1 — a)’£n-3 +---. (6.1) 
The c; are called exponential weights. It can be shown that they add up to 1, You may recognize the 
SO Tn+1, the forecast of Xn+1, is a weighted average of the current observation exponential weights as forming 


and all past observations. For 0 < a < 1, the value of c; decreases as 7 increases, A a 


so greater weight is given to more recent observations. For example, with a = 0.6, 
the forecast of Xn+ı obtained using (6.1) is 


En+1 = 0.6%, + 0.6 x (1 — 0.6)an-1 + 0.6 x (1 — 0.6)*an_2 +- -- 
= 0.62, + 0.24rn—1 + 0.0962,_2+---. 


Thus most weight is given to £n, followed by x,_1, and so on into the past. 


Activity 6.1 Calculating weights 
(a) Calculate the weights c; = a(1— a)! for i = 0, 1,2,3,4 when a = 0.5, and 
when a = 0.8. 


(b) Which of these two values of aœ gives more weight to the current observation? 


The dots on the right of the expression for %,+41 in (6.1) indicate further terms 
stretching back into the past. Clearly, it is not possible to carry on indefinitely: it 
is necessary to stop at some point in the past. But how many terms should be 
kept? In fact, this problem can be sidestepped, as follows. 
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Using (6.1) with n — 1 replacing n throughout, the 1-step ahead forecast of Xn is 
given by 

tq = Atna +a(1 sana + o(1 — 0)" tna + 0(1 =a) aag +s. 
Now note that the expression for 7,41 in (6.1) may be rewritten as follows: 

Ent1 = atn ta(1—a)tn_1 + a(1 — a) £n-2 +a(1 — a)? £n-3 +- 

= at, + (1 — a) [azn +a(1 — a)tn-2 +a(l—a)*an_3+---]. (6.2) 

But the term in square brackets in (6.2) is £n, so we have 

En41 = Alyn + (1 a)Ta: (6.3) 


Hence the 1-step ahead forecast of X,+1 is a combination of £n, the observed 
value of Xn, and Tn, the 1-step ahead forecast of Xn. Expression (6.3) makes it 
possible to obtain a forecast at each time point without having to reach far back 
into the past. The use of (6.3) is illustrated in Example 6.2. 


Example 6.2 Predicting tomorrow’s temperature, continued 


Suppose that today is 14 August and a forecast of the temperature on 15 August 
is required, based on data on temperatures from 15 July to 14 August. Some of 
these temperatures are shown in Table 6.1. For convenience, 15 July is labelled 

t = 1, and 14 August is labelled t = 31. The parameter value a = 0.6 will be used 
(this choice is arbitrary). 





Table 6.1 Observed daily temperatures 





Time t Date Observed temperature x; 
1 15 July 19.4 
2 16 July 18.9 
3 17 July 18.9 
4 18 July 17.8 
29 12 August 18.9 
30 13 August 17.8 
31 14 August 19.4 


The 1-step ahead forecast of X32, the temperature on 15 August, is required; that 
is, £32 is required. To calculate £32, the values 71, %2,...,%32 will be obtained by 
iteration: at each time point t, 7; will be calculated using only the temperatures 
known prior to that time point. 


As with all procedures involving iteration, an initial value is required. In 
exponential smoothing, the initial value is often chosen to be the value of the first 
observation. In this case, the initial value is x1, the temperature observed on 

15 July, so 


£1 = UM = 19.4. 
Next, £2, the 1-step ahead forecast of X2, the temperature on 16 July (t = 2), is 
calculated (as if it were not yet known) using (6.3): 
To = 0.624 = 0.471 
= 0.6 x 19.4 + 0.4 x 19.4 
= 19.4. 


Then 23, the 1-step ahead forecast of X3, the (presumed as yet unknown) 
temperature on 17 July (t = 3), is obtained using the observed value z2 = 18.9 
and the calculated value %2 = 19.4 in (6.3): 
£3 = 0.622 + 0.4% 
= 0.6 x 18.9 + 0.4 x 19.4 
= 19.1. 


Iteration means repetition of an 
operation, or a sequence of 
operations, at each step using 
the results of previous steps. 
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Similarly, using (6.3) to calculate 74 gives z4 = 18.98. Continuing in this way, 
using (6.3) repeatedly, leads to a value for £32, the 1-step ahead forecast of the 
temperature on 15 August. Figure 6.2 shows the time series of observed and 
forecasted temperatures up to 14 August. 





Temperature (°C) 


— Observed 


—----— Forecasted 





15 20 25 30 4 9 14 
July July July July August August August 
Date 


Figure 6.2 Observed temperatures and 1-step ahead forecasts 4 





In large data sets, calculations such as these are done using a computer. 
Nevertheless, it is important to understand the principle. Activity 6.2 will give 
you some practice in applying the method. 





Activity 6.2 Predicting tomorrow’s temperature 


Continuing the procedure described in Example 6.2 gives the value 
Lo9 ~ 20.263 12. (Note that it is important to keep full accuracy in intermediate 
calculations. ) 


(a) Using the data in Table 6.1, obtain the values of £30, T31 and T32. 


(b) Summarize your results, stating the forecast for the temperature on 
15 August to one decimal place. 


As you can see in Figure 6.2, the forecasts track the observed values, and the time 
plot of forecasts is generally smoother (that is, less spiky) than that of the 
observed data. The extent of the smoothing depends on the value chosen for the 
parameter a. Note also that the forecasts show the same general pattern as the 
data, but with a slight delay. 


The method that has been described is called simple exponential smoothing. 
The term exponential refers to the fact that the weights a(1 — a)’ lie on an 
exponential curve. The method is called simple because there are more 
complicated versions of exponential smoothing that can be used for time series 
with trend and seasonality. These are discussed in Section 7. 
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Section 6 Exponential smoothing 
The method described in this subsection is summarized in the following box. 


Simple exponential smoothing 


If a time series X+ is described by an additive model with constant level and 
no seasonality, l-step ahead forecasts may be obtained by simple 
exponential smoothing using the formula 


Bi Oe a aaa 


where £n is the observed value at time n, ©, and n41 are the 1-step ahead 
forecasts of X, and X,41, and a is a smoothing parameter, 0 <a < 1. 
The method requires an initial value 71, which is often chosen to be 2: 


“Lt = V1. 


6.2 Choosing the smoothing parameter 


The simple exponential smoothing method described in Subsection 6.1 involves a 
smoothing parameter a, 0 < a < 1. Recall that the forecasts are obtained using 
the formula 


En+1 = Alyn + (1 —a)en. 


If œa = 1, then 7,141 = £n, so the 1-step ahead forecast is just the current value. A 
value of a close to 1 means that much weight is placed on the most recent 
observation, and less weight on observations in the more distant past. The higher 
the value of a, the more jagged (that is, the less smooth) the time series of 
forecasts will be, as the forecasts adjust to changes in recent values. 


Conversely, for a value of a close to 0, more weight is placed on the distant past 
than for a value of a close to 1, and less weight on recent observations. The lower 
the value of a, the smoother the forecasts will be, as they are not affected much 
by recent values. If a = 0, then Tn+1 = Tn = +--+: = Tı = 11, the initial value. In 
this case, the forecasts lie on a horizontal line: the forecasts are all the same, as 
no weight is given to any observation after the first. 


Example 6.3 Annual precipitation in England and Wales 
Precipitation includes any water that falls from the sky as rain, sleet, snow or 


hail, and is measure in millimetres (mm). Figure 6.3 (overleaf) shows the time 
series of annual precipitation for England and Wales, from 1766 to 2004. 


The smoothing effect associated 
with different choices of a is 
similar to that obtained when 
varying the order of a moving 
average, as described in 

Section 4. 


These data were obtained in 
February 2005 from the website 
of the Met Office’s Hadley 
Centre 

http: //www.met-office.gov.uk. 
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Figure 6.3 Annual precipitation (mm) in England and Wales 





Since these are annual data, the series is non-seasonal. There is no clear upward 
or downward trend over time, and the size of the irregular fluctuations appears to 
be constant. Thus simple exponential smoothing can be used to obtain forecasts 
for the years after 2004. 


Figure 6.4 shows two sets of l-step ahead forecasts that were obtained using 
simple exponential smoothing, as described in Example 6.2, with two different 
values of the smoothing parameter a. 


Forecast Forecast 
1400 1400 
1000 1000 
600 600 
1750 1800 1850 1900 1950 2000 1750 1800 1850 1900 1950 2000 
Year Year 
(a) a=0.2 (ae SUS 


Figure 6.4 ‘Two sets of forecasts for annual precipitation 


The value a = 0.2 was used to obtain the 1-step ahead forecasts shown in 

Figure 6.4(a). Little weight is placed on the current year’s value, and the resulting 
forecasts are much less variable than the original time series. ‘The value a = 0.8 
was used to obtain the 1-step ahead forecasts in Figure 6.4(b). More weight is 
placed on the current year’s value, and the resulting time series of forecasts is 
much spikier than the series of forecasts obtained using a = 0.2. ¢ẹ 


As you saw in Example 6.3, different values of a produce different forecasts. So 
how should the value of a be chosen? There are two approaches. The first is to 
rely on past experience with other, similar, series: if some value of a has been 
shown to produce accurate forecasts, then that value is used. The second 
approach, which will be described here, is to estimate a from past data. The idea 
is to choose the value of œ which, if applied to the data available, would have 
produced the most accurate forecasts. But first, a way to assess the accuracy of 
forecasts is needed. A method is described in Example 6.4. 
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Example 6.4 Assessing the accuracy of forecasts 


In Example 6.2, temperature forecasts were obtained for each day from 15 July to 
14 August using a = 0.6. These forecasts are shown in Table 6.2, together with 
the temperatures actually observed. 





Table 6.2 Forecasted and observed temperatures 





Forecast Observed Forecast 

Time t temperature temperature error 
and date Lt L+4 Tt — Ly 
1 15 July 19.400 19.4 0.000 
2 16 July 19.400 18.9 —0.500 
3 17 July 19.100 18.9 —0.200 
4 18 July 18.980 17.5 —1.180 
29 12 August 20.263 18.9 —1.363 
30 13 August 19.445 IW gre: —1.645 
31 14 August 18.458 19.4 0.942 


The accuracy of the forecast for each day can be assessed by the difference 
between the observed value and the forecast. This difference is called the forecast 
error and is denoted e;. The forecast errors are shown in the fourth column of 
Table 6.2. The first value is zero because the initial value was chosen to be the 
temperature actually observed on the first day. 


A convenient way of summarizing the overall accuracy of the forecasts by a single 
number is to add up the squares of the forecast errors: 


(0)? + (—0.500)* + (—0.200)* + (—1.180)? + --- + (—1.363)? 
+ (—1.645)? + (0.942)? = 49.223. 


This number is called the sum of squared errors; it is often abbreviated to SSE. 
Thus, in this example, the sum of squared errors is 49.223. @ 


In Example 6.4, the 1-step ahead forecast error at time t (which is also called 
the error, or the residual, at time t) was defined to be the difference between zz, 
the value observed at time t, and z+, the 1-step ahead forecast of X+. The overall 
accuracy of the forecasts may be assessed by the sum of squared errors, 


or SSE. 





If the forecast errors are large, they will produce a large value of the SSE. Thus 
lower values of the SSE should correspond to better forecasts. A natural way to 
choose the value of the parameter a is to choose the value which minimizes 

the SSE. This value of a is said to be optimal. The procedure is summarized in 
the following box. 


Forecast errors and choice of smoothing parameter 


The 1-step ahead forecast error at time t, which is denoted e+, is the 
difference between the observed value and the 1-step ahead forecast of X+: 





Ct = Lt — T. 


The sum of squared errors, or SSE, is given by 
Si cn ee 
t=1 t=1 


Given observed values 71, %2,...,%n, the optimal value of the smoothing 
parameter a for simple exponential smoothing is the value that minimizes 
the sum of squared errors. 
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Example 6.5 illustrates the optimal choice of smoothing parameter. 


Example 6.5 Optimal choice of smoothing parameter 





For the summer temperature data that were introduced in Example 6.1, values of 
the SSE were obtained for several different values of a. These are shown in 
Table 6.3. 


Table 6.3 Values of SSF for different a 


Smoothing Forecast for 
parameter @ SSE 15 August 
0 86.19 19.40 
0.1 71.64 20.15 
0.2 63.00 19.87 
0.3 Dlr 19.48 
0.4 5337 19.21 
0.5 50.87 19.07 
0.6 49.22 19.02 
0.7 48.26 19.05 
0.8 47.95 19.13 
0.9 48.29 19.25 


l 49.32 19.40 


The SSE is smallest for a = 0.8. With this value of a, the forecasted temperature 
for 15 August is 19.13°C. The high value of a means that much weight is placed 
on recent observations. ¢ 





Activity 6.3 Forecasting annual precipitation 


Table 6.4 shows, for different values of a, the value of the SSE and the forecasted 
precipitation for 2005, based on the data for 1766 to 2004. 


Table 6.4 SSE and forecasted precipitation for 2005 


Smoothing Forecast for 
parameter a SSE 2005 (mm) 
0 6 242 680 805.5 
0.01 3916676 913.0 
0.02 3 648 339 929.7 
0.05 3 563 853 944.5 
0.1 3 602 603 959.1 
0.2 3 746 147 970.1 
0.3 3 946 979 966.6 
0.4 4 192 445 957.0 
0.5 4 478 857 947.1 
0.6 4 809 321 940.7 
0.7 5191416 939.6 
0.8 5 636 379 944.6 
0.9 6 159 532 955.8 


1 6 781 865 973.6 


(a) Identify the optimal value of the smoothing parameter a, and give the 
corresponding forecasted precipitation for 2005 based on data for 1766 to 
2004. 


(b) Interpret the optimal value of a in terms of the weight accorded to recent 
and distant past observations. 
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Summary of Section 6 


In this section, the simple exponential smoothing method for forecasting time 
series with constant level and no seasonality has been described. This method 
involves iteration with the initial value chosen to be equal to the first observed 
value of the time series. You have learned how to calculate and interpret 1-step 
ahead forecasts. The effect of using different values for the smoothing parameter 
has been illustrated. Forecast errors have been defined, and the method of 
choosing the value of the smoothing parameter by minimizing the sum of squared 
errors has been described. 





Exercises on Section 6 


Exercise 6.1 Calculating forecasts 


In Activity 4.2, data on successive two-hourly readings on the concentration of a 
chemical process were described. Table 6.5 shows the first four readings, taken at 
times 2, 4, 6 and 8 hours. 


(a) Obtain the 1-step ahead forecast for the concentration at 10 hours using 
simple exponential smoothing, with initial value 7, = zı and smoothing 
parameter a = 0.2. 


(b) Calculate the SSE to three decimal places. 


Exercise 6.2 Choosing the smoothing parameter 


The time series used in Exercise 6.1 comprises 197 successive two-hourly readings, 
taken at times 2 to 394 hours. Simple exponential smoothing was used with these 
data to predict the concentration at 396 hours (time point 198) for several values 
of the smoothing parameter a. Table 6.6 shows the SSE and the forecast Xj9g for 
each of these values of a. 


(a) Identify the optimal value of œ among those listed in Table 6.6, and explain 
your choice. Hence write down the corresponding forecast for Xj98. 


(b) Suppose that the value of the smoothing parameter is chosen to be 0.8. Will 
the time series of forecasts be smoother than that obtained using the optimal 
value of a you identified in part (a), or will it be less smooth? Explain your 
answer. 


Table 6.5 Chemical 


concentrations 

t Time Concentration 

1 2 hours 17.0 

2 4 hours 16.6 

3 6 hours 16.3 

4 8 hours 16.1 

Table 6.6 SSE and forecasted 
concentration 

Smoothing Forecast at 
parameter a SSE 396 hours 
0 32.01 17.00 
0.1 21.83 1745 
0.2 20.17 17.51 
0.3 19.89 17.50 
0.4 20.07 17.47 
0.5 20.52 17.43 
0.6 21.19 1741 
0.7 22.09 17.39 
0.8 23.26 17.38 
0.9 24.79 17.39 

1 26.74 17.40 
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The simple exponential smoothing method can be used for time series with 
constant level and no seasonality. In this section, more general exponential 
smoothing methods are presented. In Subsection 7.1, an extension of simple 
exponential smoothing which is appropriate for time series with a linear trend is 
described. In Subsection 7.2, this is extended further to time series with a 
seasonal component. A note of caution about forecasting in general is given in 
Subsection 7.3. 





7.1 Holts exponential smoothing method 


The simple exponential smoothing method described in Section 6 applies to 
additive time series with constant level and no seasonality, which can be described 
by the model 


X= m + Wi. 


The parameter m is the level (or mean) of the series, and W; is the irregular 
component (with mean zero). What happens if simple exponential smoothing is 
used on a time series with a trend component that is not constant? This is 
illustrated in Example 7.1. 


Example 7.1 Annual average house prices 


Figure 7.1 shows the time plot of annual average house prices in the UK from These data were obtained in 
1996 to 2004, and the forecasts obtained using simple exponential smoothing. February 2005 from the website 
of the Nationwide Building 
Society 
Price (0 www.nationwide.co.uk/hpi. 
> 
MeN, —— Observed 


---— Forecasted 


100 000 





50 000 
1996 1998 2000 2002 2004 
Year 


Figure 7.1 Observed and forecasted average house prices, 1996-2004 


The optimal value of the smoothing parameter for this time series is a = 1. In 
this case, (6.3) reduces to 


Tai =a, Other values of a all give 
ao forecasts lower than the current 
so the forecast at each time point is the current value. Since there is an increasing value. 


trend, the forecast always lags behind the observed value. For example, in 2003 
the average house price was £126 840. Hence the forecast for 2004 was also 
£126 840, whereas the actual average 2004 observed price was £148 548. 4¢ 





The simple exponential smoothing forecast 7:41 is an estimate of m+, the level of 
the time series at time t, that is, at the preceding time point. When the level is 
constant, so that m4, = M+ = m, this is not a problem: in this case, the method 


06 


Section 7 Holt—Winters forecasting 


gives accurate forecasts. However, when there is a rising trend (as in 
Example 7.1) or a declining trend, the method can produce inaccurate forecasts. 


Suppose that the time series X; can be described by an additive non-seasonal 
model with a linear trend component, that is, 


Xi = m + bt -+ Wi, 
where b is the slope of the trend component m, = m + bt. Note that 
Xiıyı = m+ bE+1) + Wey 


= (m + bt) + b + Wis 
= Mua -+ b -+ W1- (7.1) 


The 1-step ahead forecast for X11; obtained using simple exponential smoothing 
is an estimate of m;, the level at time t, so that 


Lt+1 = M. 





From expression (7.1), since W;,1 has mean zero, the expected value of X441 is 
M+ b. Hence a better forecast is 


Tia = Mt + 6, 


where b is an estimate of the slope. In Example 7.1, this would mean ‘adding a bit 
on’ to this year’s estimate, in line with recent growth in house prices. 


Holt’s exponential smoothing method provides a way of estimating both the 
level m and the slope b at each time point. This is achieved by a smoothing 








method very similar to that used in simple exponential smoothing. The main The details are omitted. You 
difference between Holt’s method and simple exponential smoothing is that the will use SPSS to do the 
smoothing is now controlled by two parameters — one parameter a for the calculations in Section 8. 
estimate of the level m, at the current time point, and one parameter y for the The symbol y is the Greek 
estimate of the slope b. The values of both parameters, a and y, lie between 0 lower-case letter gamma. 


and 1. As with simple exponential smoothing, values of the parameters close to 0 
mean that little weight is placed on the most recent observations. On the other 
hand, values close to 1 mean that much weight is placed on the most recent 
observations. 


Example 7.2 Annual average house prices: Holt’s method 


Figure 7.2 shows the observed average house prices and the forecasted values 
obtained using Holt’s exponential smoothing method. 
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Figure 7.2 Observed and forecasted values using Holt’s method 


Notice that the forecasts obtained using Holt’s exponential smoothing method are 
much more accurate than those obtained using simple exponential smoothing 
(which are shown in Figure 7.1). œ 
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To implement Holt’s method, two initial values are needed — an initial value for 
the level, and an initial value for the slope. The initial value for the level is taken 
to be x1, as for simple exponential smoothing. There are several possibilities for 
the initial value for the slope, including x2 — xı, the difference between the first 
two observations, and 0, which is a sensible choice when the slope is not clear. In 
Example 7.2, the initial value for the slope parameter was the difference £o — 71. 








The sum of squared errors SSE is defined in exactly the same way as for simple 
exponential smoothing: 


n 
SSE =Y (m=i). 
t=1 
The values of the smoothing parameters a and y are chosen so that the SSE is 
minimized. The forecasts in Figure 7.2 were obtained using these optimal values: 
a = 0.94 and y = 1. Thus the most recent values of the level and the slope are 
given much weight. A further illustration of the method is given in Example 7.3. 


Example 7.3 Central England temperatures, 1901—2004 





Suppose that we wish to forecast the average temperature in 2005 in Central 
England, using the annual average temperatures for Central England from 1901 to 





2004. The first step is to obtain the time plot of the data. This is shown in This data set is a subset of the 
Figure 7.3. data set described in 
Example 1.2. 


Temperature (°C) 


10.9 


1900 1950 2000 
Year 


Figure 7.3 Annual average Central England temperatures (°C), 1901-2004 


The data are annual, so the time series is not seasonal, and the width of the 
irregular fluctuations does not appear to vary with the level. Temperatures 
appear higher towards the end of the series than at the beginning, so it is not 
unreasonable to assume that there might be an increasing trend. Thus Holt’s 
exponential smoothing method can be used to obtain 1-step ahead forecasts, and 
in particular the forecast for 2005. 


The 1901 temperature was 9.11°C, so the initial value for the level is xı = 9.11. 
For an initial value for the slope, the difference between the first two values could 
be used: in this case, £2 — xı = 8.83 — 9.11 = —0.28. Alternatively, 0 could be 
used as the initial value. Since the trend is not obvious, but is unlikely to be 
downward, 0 seems more appropriate here than —0.28. 





The values of the smoothing parameters a and y that give the smallest value of 
the SSE are (to two decimal places) a = 0.05 and y = 0.36. Thus the level 
depends very little on recent values, and the slope a little more so. 
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The 1-step ahead forecasts for the period 1901 to 2005 are shown in Figure 7.4. 


Forecast 


10.9 


1900 1950 2000 
Year 





Figure 7.4 1-step ahead forecasts for annual average temperatures (°C), 
1901-2005 


The time series of forecasts is quite smooth: this is because the value of a is close 
to zero. For Holt’s method with optimal parameter values, the SSF is 25.32. The 
forecasted average temperature for 2005 is 10.59°C. 





What would have happened if the simple exponential smoothing method had been 
used with these data? For simple exponential smoothing, the optimal value of a is 
0.15 in this case, the SSE is 25.64, and the forecasted temperature is 10.24°C. 
Thus the SSE is slightly higher and the forecasted temperature slightly lower 
than with Holt’s method. In this example, Holt’s method is only marginally 
preferable (owing to the smaller SSE) to the simple exponential smoothing 
method. ¢ 








In Example 7.3, Holt’s method provided only a marginal improvement over simple 
exponential smoothing. This illustrates a useful property of all exponential 
smoothing methods: they are quite robust. In other words, moderate departure 
from the assumptions does not generally have a large adverse effect on the 
accuracy of the forecasts. In Example 7.1, the level of the time series is clearly not 
constant, so the departure from the assumption of constant level was decidedly 
not moderate: simple exponential smoothing gave inaccurate forecasts in that 
example. 


Activity 7.1 will give you some practice at interpreting the forecasts obtained 
using Holt’s exponential smoothing method. 


Activity 7.1 Forecasting house prices 


Examples 7.1 and 7.2 used annual house price data. This activity is based on the 

monthly time series of average house prices, expressed in pounds sterling, in This time series was introduced 
England between January 1991 and January 2005. In this activity, you will in Example 1.4. 

discuss forecasts of average house prices for February 2005, obtained by applying 

Holt’s exponential smoothing to the logarithms of monthly average house prices 

between January 1996 and January 2005. The log transformation is used so that 

the monthly time series may be described using an additive model. 
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The time plot of the natural logarithms of average house prices is shown in 
Figure 7.5. 
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Figure 7.5 Logarithms of average house prices, January 1996 to January 
2005 


The first two values of the time series are 10.830 13 in January 1996, and 
10.844 58 in February 1996. 


(a) Why is Holt’s exponential smoothing to be preferred over simple exponential 
smoothing for this time series? 





(b) 

(c) Table 7.1 shows the values of the SSE obtained for several pairs of values of 
the parameters a and y, and the corresponding forecasts for February 2005. 
Identify the optimal combination of parameter values, and obtain the a 
forecasted average house price in pounds sterling for February 2005, to the 


Suggest appropriate initial values for the level and the slope. 


0.8 

nearest £100. 0.8 

Figure 7.6 shows the observed values and the 1-step ahead forecasts for the year T 
leading up to February 2005, expressed on the original scale (pounds sterling). 1 
1 
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Figure 7.6 Observed and forecasted average house prices, March 2004 to 
January 2005 


(d) The upward trend in house prices ended in August 2004. Briefly describe how 
the forecasts adjusted to this change. 
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Table 7.1 


and forecasts 


Parameters, SSE 


SSE 


0.012 72 
0.011 80 
0.011 05 
0.010 75 
0.010 47 
0.010 72 


Forecast 


11.9412 
11.9312 
1L9277 
11.9373 
11.9294 
11.9265 


Section 7 Holt—Winters forecasting 


7.2 Holt-Winters exponential smoothing 


Holt’s exponential smoothing method applies only to non-seasonal time series. A 
further extension of exponential smoothing, that includes both a linear trend 
component and seasonality, is known as Holt—Winters exponential 
smoothing. This method is appropriate for time series that can be described by 


an additive model with a linear trend component m + bt and a seasonal A version of the Holt—Winters 
component s;, that is, method exists for multiplicative 
models, but this will not be 
X;=m+bt+ sı + Wa. discussed in this book. 


Using the Holt-Winters method, the 1-step ahead forecast of X++1 is 
G44 m Meto Sais 


This method involves three smoothing parameters. In addition to the two 

parameters for Holt’s exponential smoothing method («œ for the level and y for the 

slope), there is a third smoothing parameter, 6. This parameter adjusts the The symbol 6 is the Greek 
estimate of the seasonal component. Optimal values of the three parameters a, y lower-case letter delta. 
and 0 may be chosen by minimizing the SSE. For all three parameters, values 

close to 1 indicate that much weight is placed on the most recent observations, 

and values close to 0 indicate that little weight is placed on the most recent 

observations. 





There are many different ways of choosing initial values. For simplicity, xı will 
generally be used as the initial value for the level, and either 0 or (api, — x1)/T 
as the initial value for the slope, where T is the period of the seasonal component. 
Initial values for the seasonal factors can be obtained by estimating the seasonal 
components using the method described in Subsection 4.2. 


Example 7.4 Forecasting house prices 


In Activity 7.1, the application of Holt’s exponential smoothing method to the 
time series of logarithms of average house prices was discussed. In fact, house 

prices are seasonal (see Figure 1.8(b)). Therefore the Holt—Winters method is 

more appropriate. 





The initial values will be chosen as follows. The initial value for the level is the 
first value of the series, namely xı = 10.830 13. This value corresponds to 
January 1996. Since the period of the seasonal component is 12, the initial value 
for the slope is the difference between the first two January values, divided by 12: 


(x13 — 21) /12 = (10.908 81 — 10.830 13) /12 ~ 0.0066. 








The initial values for the seasonal factors were derived using the method described In this book, you will not need 
in Subsection 4.2 for estimating the seasonal factors of a time series. Using these to supply starting values for the 
initial values, the optimal parameter values were a = 0.9, y = 0.3 and 6 = 0.0. acme ee pan na 

The fact that the estimate of the parameter 6 was 0 means that the initial values ee i —— 

for the seasonal factors did not need to be updated; it does not mean that the 

seasonal component was not required in the model. 


For the Holt—Winters forecasting method with the optimal parameter values, the 
SSE is 0.006 22, compared to 0.01047 for Holt’s method. Thus the Holt—Winters 
method gives better forecasts, as would be expected since the time series is 
seasonal. 





Figure 7.7 (overleaf) shows the forecasts for the year leading up to February 2005, 
on the original scale; these were obtained by taking exponentials. 
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Figure 7.7 House prices: observed value, Holt forecast and Holt—Winters 


forecast 


It appears that the Holt—Winters forecasts responded slightly more rapidly than 
the Holt forecasts to the change in the slope of the trend in August 2004. 4 


Activity 7.2 provides a further comparison of the three exponential smoothing 


methods described in Sections 6 and 7. 


Activity 7.2 Forecasting monthly temperatures 





The time plot of monthly average temperatures in Central England from 
January 1970 to December 2004 is shown in Figure 7.8(a). 
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Figure 7.8 Central England temperatures: (a) time plot, 1970-2004 (b) observed values and forecasts, 


February 2004 to January 2005 


Figure 7.8(b) shows the 1-step ahead forecasts for the year up to January 2005 
obtained by using the simple, Holt and Holt—Winters exponential smoothing 


methods. 


(a) The SSEs for the three methods are 3387, 799 and 3300 (not necessarily in 
that order). Which SSE corresponds to the Holt—Winters model, and why? 
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(b) The forecasted monthly temperatures for February 2005 were 5.3°C using 
simple exponential smoothing, 2.6°C using Holt’s method, and 4.8°C using 
the Holt—Winters method. Which forecast is likely to be the most reliable? 
Briefly justify your answer. 


The Holt—Winters exponential smoothing method allows for both a linear trend 
and seasonality. Occasionally, there is no trend, just seasonal variation around a 
broadly constant level. In this case, the smoothing parameter ~y for the slope is 
omitted, and the resulting method is known as Winters exponential 
smoothing. This method will not be discussed further in this book. 


7.3 A note of caution 


All forecasting methods based on time series data are based on the assumption 
that the past is a useful guide to the future, and hence that the values observed in 
the past can be used to predict those that will arise in the future. This 
assumption is very much an act of faith. Furthermore, there are plenty of 
examples to show that it can be wrong. One such example is described in 
Example 7.5 and Activity 7.3. 





Example 7.5 Forecasting the FTSE100 index 


The FTSE100 index was described in Exercise 4.1. Changes in the value of the 
index are used to measure the performance of financial markets. 


Figure 7.9 shows the time plot of the FTSE100 index at close of trade on the last 
day of each month between April 1984 and September 1987, together with the 
l-step ahead forecasts produced by Holt’s exponential smoothing method. 
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Figure 7.9 FTSE100 index: observed values and forecasts 


The optimal values of the smoothing parameters, which were used to obtain the 
forecasts, were a = 0.91 and y = 0.09. 


The index climbed steadily throughout this period. In line with this performance, 
the forecast for October 1987 was 2406.94. What happened next is the subject of 
Activity 7.3. 4 


Several other exponential 
smoothing methods exist, but 
are not covered in this book. 
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Activity 7.3 What happened next 


The steady upward trend in the FTSE100 index which continued through the mid 

1980s reflected a generally confident mood in the world of finance. For example, Other voices urged caution, 

on 26 August 1987 the Wall Street Journal (a leading business newspaper in the warning that the steady upward 
USA) quoted a financial expert as saying that ‘It’s pretty much taken for granted trend was unsustainable. 

now that the market is going to go up’. 


On 19 October 1987 the world’s stock markets crashed and share prices dropped 
sharply. Figure 7.10 shows the FTSE100 index and the forecasts for the period 
April 1984 to October 1987. 
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Figure 7.10 The October 1987 stock market crash 


The forecast for October 1987 obtained using Holt’s method, as described in 
Example 7.5, was 2406.94 points; the actual value was 1749.80. 





(a) Calculate the forecast error for October 1987. Comment on the accuracy of 
the forecast for October 1987, compared to the accuracy of forecasts for the 
months prior to October 1987. 


(b) The forecasts shown in Figure 7.10 were obtained using Holt’s exponential 
smoothing method. Could the accuracy of the forecast for October 1987 have 
been improved by using the Holt-Winters method? Explain your answer. 


Example 7.5 and Activity 7.3 illustrate the important point that achieving a good 
agreement between observed values and predicted values using past data by 
minimizing the SSE does not guarantee that future predictions will be accurate. 
In this connection, it is important to distinguish between in-sample forecasting 
errors (for which the SSE is minimized) and out-of-sample forecasting errors, 
namely those obtained by applying the method to forecast values that genuinely 
lie in the future. In general, out-of-sample forecasting errors tend to be larger 
than in-sample forecasting errors. 


No statistical forecasting method could have predicted that share prices would 

collapse in October 1987 because, in this particular case, the recent past was a If statisticians were able to 
very bad guide to what was to happen next. Nevertheless, exceptional events such predict such events, they would 
as this are uncommon. In many contexts, time series are more predictable, and be very rich indeed! 
short-term extrapolation methods, including those based on exponential 

smoothing, provide reliable results — most of the time... . 





64 


Section 7 Holt—Winters forecasting 


Summary of Section 7 


In this section, simple exponential smoothing has been extended to more general 
settings. You have learned how to obtain 1-step ahead forecasts using Holt’s 
exponential smoothing method for time series with a linear trend. The 
Holt—Winters method, which applies to seasonal time series with linear trend, has 
been described. Some of the limitations of statistical forecasting methods have 
been emphasized. 





Exercise on Section 7 


Exercise 7.1 Choosing a forecasting method 


Figure 7.11 shows the time plots of four time series. You may assume that any 
cyclic variation in these plots is seasonal. 
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Figure 7.11 Four time series 


For each of the time series in Figure 7.11, exponential smoothing is to be used to 
forecast the next value of the series. For each series, state which of simple 
exponential smoothing, Holt’s exponential smoothing and Holt—Winters 
exponential smoothing you might use, if any, and which of the methods you would 
not use. In each case, give your reasons. 
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8 Exponential smoothing in SPSS 


In this section, you will learn how to use SPSS to obtain 1-step ahead forecasts 
using simple exponential smoothing, Holt’s method and the Holt—Winters 
method. You will also learn how to obtain forecasts more than one step ahead. 


Refer to Chapters 4 and 5 of Computer Book 2 for the work in this ea’ 
section. X N 





Summary of Section 8 


In this section, you have learned how to implement simple, Holt’s and 
Holt—Winters exponential smoothing in SPSS, and how to obtain forecasts using 
these methods. 


9 Autocorrelation and prediction intervals 


Two questions are addressed in this section. 
© Could the forecasting method be improved upon? 


© How accurate are the forecasts of future values? 





Producing a forecast is easy, but producing a good forecast is much more difficult. 
Ultimately, it can only be known for sure if a forecasting method is any good by 
comparing the forecast with the actual outcome. However, it is also possible, to 
some limited extent, to assess the reliability of a forecast by studying the 
statistical properties of the method used to obtain it, and by calculating an 
estimate of the uncertainty surrounding it. 


The key idea in this section is to make use of information about the correlations 
between successive forecast errors. In Subsection 9.1, these correlations are 
discussed, and a graphical method for displaying them is presented. In 
Subsection 9.2, two tests for zero correlation are described. A simple method for 
assessing the uncertainty of a forecast is described in Subsection 9.3. 


9.1 The correlogram 


The exponential smoothing methods described in Sections 6 and 7 all involve 
obtaining 1-step ahead forecasts 7), T2,..., Zn of the past values z1, £2,..., En, 
from which forecast errors ep = x4 — Z; are calculated. The 1-step ahead forecast 
errors e; are observations on random variables Æ+, which themselves constitute a 
time series: 


Ey = Xi — T. 
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Provided that an appropriate forecasting method was chosen, the forecast errors 
should not display any trend or seasonality. They should fluctuate around zero, 
with roughly constant variance. 


In this subsection, the correlations between the forecast errors are investigated. If Correlation is discussed in the 
the correlation between E;_; and E;, say, or between E-o and Es, is non-zero, Introduction to statistical 


then it should be possible to improve upon the forecasting method. This idea is modelling. 
developed in Example 9.1. 
Example 9.1 British Government securities 
Figure 9.1(a) shows the time plot for the monthly percentage yields on British Chatfield, C. (2004) The 


Government securities, for 21 years between 1950 and 1970. (For definiteness, the Analysis of Time Series: An 


time span January 1950 to December 1970 has been used.) a an 
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Figure 9.1 Percentage yields on Government securities: (a) time plot (b) forecast errors 


Analysis of the time series suggested that there is no substantial seasonal 
variation, so Holt’s exponential smoothing method was used to obtain 1-step 
ahead forecasts. The forecast errors are shown in Figure 9.1(b). The errors 
appear to be distributed around zero, with constant variance. But are the errors 
correlated? 


Suppose, for example, that the correlation between E;_,; and E; is positive. Then 
a large positive forecast error will tend to be followed by another large positive 
forecast error, and a large negative forecast error by another large negative error. 
The correlation between the forecast errors can be investigated by arranging the 
forecast errors in pairs (e:-1, e+) and calculating the correlation coefficient for 
these pairs. There are n = 252 time points, so there are 251 pairs: 

(€1, €2), (€2,€3),---, (€251, €252). The sample correlation coefficient is 0.34, which 
suggests a weak positive correlation. 


Is Et- correlated with Æ? To investigate this, the sample correlation coefficient 


can be calculated from the 250 pairs (e1, e3), (€2, €4),-.--, (€250, €252). The 
correlation coefficient is 0.045, which is close to zero, suggesting that there is no 
correlation. 
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These correlations may also be conveyed graphically. Figure 9.2(a) shows a 








scatterplot of the pairs (€1, €2), (€2,€3),---, (€251, €252). 
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Figure 9.2 Scatterplots of forecast errors: (a) ep and e¢—1 (b) e and e¢_2 


Figure 9.2(b) shows a scatterplot of the pairs (e1, e3), (e2,e4),---, (€250, €252). The 
points in the scatterplot in Figure 9.2(a) lie around a line with positive slope, so 
this suggests that the correlation between EF; and E;_1 is positive. On the other 
hand, the scatterplot in Figure 9.2(b) suggests there is little if any correlation 
between F; and E;_9. 


The existence of a positive correlation between the random variables £;—1; and F; 
has the following practical consequence. Suppose that the value z,_; has been 
observed at time n — 1, and that the 1-step ahead forecast z» has been obtained. 
This is a forecast of X», the value at time n. The forecast error at time n — 1 is 
€n—1 = Ln—1 — Ln—1-. Suppose that this is large and positive. Since the correlation 
between one forecast error and the next is positive, we might expect that 

En = Xn — zn will also turn out to be large and positive. In this case, it would be 
sensible to increase the forecast £n a little, so as to reduce the likely forecast 
error. Thus knowledge of the correlation suggests a way of improving the forecast. 
In other words, the forecast method that has been used is not the best possible, 
since a way has been found of improving upon it. ¢ẹ 





In Example 9.1, the sample correlation was calculated using the pairs 

(€1, €2), (€2,€3),---, (€251, €252). One way to think of this is as follows. Start with 
the original time series e1, €2,...,€252 and shift it along one place, removing the 
last observation. This gives the time series *, €;,€2,...,€251, where the star 
denotes an empty position. ‘The original time series is said to have been lagged 
by one place. The correlation is then calculated between the original time series 
and the lagged time series, ignoring the values in the first position. This is the 
correlation at lag one, the lag being the number of places the time series has been 
shifted. 








Similarly, the correlation at lag 3, for example, would be calculated between the 
time series €1, €2,€3,...,€252 and the time series x, *, *,€1,€2,...,€249, ignoring 
the first three values. 





Since these correlation coefficients are calculated between the original time series 

and lagged versions of the same time series, they are called sample 

autocorrelations. The formula for calculating a sample autocorrelation is very The algebraic expression for 
similar (but not identical) to that used for calculating the sample Pearson calculating the sample 


correlation coefficient, and it gives virtually identical results in large samples. autocorrelation has been 
omitted as you will not be 


required to calculate any 
autocorrelations by hand. 
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Sample autocorrelations are defined in the following box. 


Sample autocorrelation at lag k 


If X; is a time series with n observed values 71, %2,...,%n, then the time 
series lagged by k places is the time series with X;_, in position t. ‘The 
first k positions of the lagged series comprise missing values. 


The sample autocorrelation at lag k is a correlation coefficient rę 
calculated between a time series and a copy of itself, lagged by k places; it is 
calculated using the n — k pairs of points 

(Ge, Te) (xo, nee TET e T 





Activity 9.1 Lagged time series and autocorrelations 


The time series X; takes the ten values 5, 1, 4, —9, 3, —3, 7, 0, —1, 8. 
(a) Obtain the time series lagged by five places. 


(b) From which pairs of values are the sample autocorrelations r3 and re 
calculated? 


(c) In any time series, the autocorrelation at lag 0 is equal to 1. Explain why this 
iS SO. 


In a time series with n values, it is possible to calculate sample autocorrelations at 
lags 0, 1, ..., n— 2, though in practice only the first few lags are needed. A useful 
way to present the sample autocorrelations is in a bar chart with the lags on the 


horizontal axis. This bar chart is called the correlogram or sample ACF plot. ACF is an abbreviation for 
autocorrelation function; this 


will be defined in Subsection 9.2. 


Example 9.2 British Government securities: the correlogram 


In Example 9.1, the time series of 1-step ahead forecast errors for monthly 
percentage yields on Government securities was discussed. The autocorrelation at 
lag 1 is 0.34, and the autocorrelation at lag 2 is 0.045. The autocorrelations at 
higher lags can be calculated in a similar way. However, rather than calculating 
and discussing each autocorrelation individually, it is more useful to show them 
together in a correlogram. Figure 9.3 shows the correlogram for the time series of 
1-step ahead forecast errors for lags 1 to 20. 
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Figure 9.3 Correlogram for the forecast errors 
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The correlogram shown in Figure 9.3 gives an overall picture of correlations at 
different lags: the largest autocorrelation (in absolute value) is at lag 1, and the 
autocorrelations at other lags are all much closer to zero. Since the 
autocorrelation at lag 1 is positive, the correlogram in Figure 9.3 shows that the 
forecast error at time t is positively correlated with the forecast error at time 

t— 1. Thus, for example, if today’s forecast is too low, it is likely that the forecast 
for tomorrow will be too low. As noted in Example 9.1, this suggests that it 
should be possible to improve on the accuracy of the forecasts produced by this 
forecasting method. 4 


Correlograms are often not easy to interpret. However, you should look for the 
following two types of features. 


© Sample autocorrelations with relatively large positive or negative values, 
especially for small lags, or lags that can readily be interpreted (for example, 
lag 4 in a quarterly series, or lag 12 in a monthly series). 


© Patterns among the autocorrelations — for example, whether a clump of 
positive values is followed by a clump of negative values. 


It is important not to over-interpret a correlogram, that is, to make too much of 
minor features. A useful rule of thumb is to identify one, or at most two, key 
features, if there are any. 


Activity 9.2 Average annual temperatures, 1901—2004 
In Example 7.3, Holt’s exponential smoothing method was applied to the time 


series of annual average temperatures in Central England for 1901 to 2004. The 
correlogram for the forecast errors at lags 1 to 20 is shown in Figure 9.4. 
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Figure 9.4 Correlogram for forecast errors for temperatures, 1901-2004 


(a) In your view, do any lags stand out as corresponding to particularly strong 
autocorrelations? Is there any regular pattern? In one sentence, summarize 
the autocorrelations between the forecast errors in this time series. 





(b) Comment briefly on what, if anything, this suggests about the possibility of 
improving upon these forecasts. 
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9.2 Tests for zero autocorrelation 


The sample autocorrelations described in Subsection 9.1 are estimates of 
population autocorrelations. The (population) autocorrelation at lag k, which is 
denoted p,, is the autocorrelation between F;_, and Ey. It is assumed that p} 
depends only on k and, in particular, that it does not depend on t. The 
(population) autocorrelations p,, k = 1,2,..., define the autocorrelation 


function, or ACF of the time series. The ACF is discussed in more 


detail in Part III. 
The sample autocorrelation rę is the estimated value of p;,, so 





If there is evidence of autocorrelation between the 1-step ahead forecast errors, 
then the forecasting method used is not the best possible: it could (in principle) 
be improved by using the information revealed by the autocorrelations to improve 
the forecasts. 


Owing to random variation, even if pp = 0 for k > 1, it is unlikely that the sample 
autocorrelation rą will be exactly zero. In order to decide whether it is reasonable 
to conclude that p, = 0, a significance test of the null hypothesis p, = 0 is 
required. 


Provided that the 1-step forecast errors have constant variance, then under the 
null hypothesis p, = 0, it can be shown that the distribution of the sample 
autocorrelation calculated from a time series with n time points is approximately 
normal with mean 0 and variance 1/n: 


Pr S n(o, z) . The proof of the validity of this 
m approximation is omitted. 
This null distribution can be used to calculate a p value and hence test the null 
hypothesis p, = 0. In practice, p values are seldom calculated. Instead, a simple 
graphical method is used. This is motivated as follows. 


If the null hypothesis is true, the probability of obtaining a sample 
autocorrelation within the interval defined by the limits +1.96/,/n is 0.95. Thus, 
if rg lies outside the interval 


(—1.96//n, 1.96/V/n) , 


then the significance probability of the test is less than 0.05 (p < 0.05), and hence 
it may be concluded that there is at least moderate evidence that the 
autocorrelation at lag k is different from zero. 








The values +1.96/,/n can be represented conveniently on the correlogram by 
horizontal lines. ‘These horizontal lines are called significance bounds. The 
significance bounds greatly facilitate the interpretation of the correlogram. This is 
illustrated in Example 9.3. 


Example 9.3 Forecasting the FTSE100 index 


Holt’s exponential smoothing method was used to obtain the 1-step ahead 

forecasts for the monthly series of logarithms of the FTSE100 index of share 

prices (the ‘footsie’ index) between January 1988 and January 2005. The These data were discussed in 
autocorrelations of the forecast errors may be investigated as follows. Exercise 4.1. 
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The first step is to check that the forecast errors have (approximately) mean zero 
and constant variance. The time plot of the forecast errors is shown in 
Figure 9.5(a). 
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Figure 9.5 Forecast errors: (a) time plot (b) correlogram 


The forecast errors appear to be centred on zero, and there is no obvious change 
in the level of the series. The width of the fluctuations does not vary 
systematically, so there is no reason to doubt the assumption of constant variance. 


The correlogram for lags 1 to 20 is shown in Figure 9.5(b). The two horizontal 
lines on either side of the central line represent the significance bounds +1.96/,/n. 
For this series, n = 205, so the bounds are +0.137. 





None of the sample autocorrelations crosses these bounds. Thus there is little 
evidence that any of the underlying population autocorrelations is non-zero. 4 


Activity 9.3 UK precipitation data 


Simple exponential smoothing was used to obtain 1-step ahead forecasts using the 
UK precipitation data discussed in Example 6.3. Figure 9.6(a) shows the time plot 
of the forecast errors, and Figure 9.6(b) shows the correlogram for lags 1 to 20. 
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(a) (b) 
Figure 9.6 Forecast errors for UK precipitation, 1766-2004: (a) time plot (b) correlogram 
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(a) Does either the level or the variance of the forecast errors change 
systematically over time? (You should ignore the first few values of the time 
plot, which are affected by the choice of starting values. ) 


(b) For this time series, n is 239. Calculate the positions of the bounds in 
Figure 9.6(b). 





(c) What does Figure 9.6(b) suggest about the presence of non-zero population 
autocorrelations at lags 1 to 20? 





One problem with assessing the evidence for non-zero autocorrelations using the 
significance bounds +1.96/,/n is as follows. Suppose that, say, 20 sample 
autocorrelations are tested in this way, and that all the underlying 
autocorrelations are zero. For each k, the probability of obtaining a sample 
autocorrelation lying outside the interval (—1.96/,/n, +1.96/,/n) is 5%. So, on 
average, we would expect 1 out of 20 sample autocorrelations to be outside the 
significance bounds by chance, even though all the underlying autocorrelations are 
Zero. 





An alternative approach is to fix the number k of lags to be considered in 
advance, and test the null hypothesis 


Hopi = Pp — "= Pp. 


Rejection of the null hypothesis means that one or more of p4, Po,... 
non-zero. 


» PES 


A test of a null hypothesis involving several autocorrelations is called a 
portmanteau test. The simplest portmanteau test is based on the following test 
statistic: 


k 
onr 
j=1 


Large values of the test statistic Q provide evidence against the null hypothesis, 
and lead to small p values. There are many variants of this test statistic. A 
commonly used version of this portmanteau test is the Ljung—Box test. You 
will not be required to calculate the test statistic, only to interpret significance 
probabilities, so the details of the test will be omitted. You will use SPSS to carry 
out the test in Section 10. The interpretation of p values in this context is 
summarized in Table 9.1. 


Table 9.1 





Interpretation of p values from a portmanteau test for lags 1 to k 


Significance probability p Rough interpretation 











p > 0.10 little evidence of autocorrelation at lags 1 to k 
0.10 > p > 0.05 weak evidence of autocorrelation at lags 1 to k 
0.05 > p > 0.01 moderate evidence of autocorrelation at lags 1 to k 
p < 0.01 strong evidence of autocorrelation at lags 1 to k 


Example 9.4 Testing for non-zero autocorrelation 


Holt’s exponential smoothing was used to obtain 1-step ahead forecasts for the 
annual average temperature in Central England. In Activity 9.2, you found that 
the autocorrelations at lags 1 to 20 between the forecast errors were all ‘small’. 





The value of the test statistic for the Ljung—Box test applied to these 20 sample 
autocorrelations is 23.23, and the p value is 0.278. This provides little evidence of 
non-zero autocorrelation at lags 1 to 20. @ 





A portmanteau test combines 
several tests into one, just as a 
portmanteau word combines 
several words, as in the word 
smog (smoke and fog). 





The interpretation of 
significance probabilities is 
discussed in the Introduction to 
statistical modelling. Under Ho, 
the Ljung—Box test statistic is 
approximately y*(k) for large n. 


The correlogram for forecast 
errors for temperatures, 
1901-2004, is shown in 
Figure 9.4. 
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Activity 9.4 British Government securities 


The time series of monthly yields on British Government securities was 
introduced in Example 9.1. The forecast errors resulting from using Holt’s 
exponential smoothing to obtain 1-step ahead forecasts were discussed in 
Examples 9.1 and 9.2. 


(a) The value of the test statistic for the Ljung—Box test for lags 1 to 20 is 60.82, 
and the p value is less than 0.0005. What do you conclude? 


(b) For this time series, n = 252. Calculate the 95% significance bounds for 
individual sample autocorrelations. 


(c) The correlogram, with 95% significance bounds, is shown in Figure 9.7. 
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Figure 9.7 Yields on Government securities: the correlogram for forecast 
errors 





Identify the lags, if any, at which there is evidence that the underlying 
population autocorrelations are non-zero. 


In practice, it is not possible to test the hypothesis that p, = 0 for all k > 1. Thus 
both the correlogram and the Ljung—Box test require a decision to be made about 
how many lags to consider. Usually, only the first few lags are of interest, together 
with the lag corresponding to the seasonal period — for example, for monthly 
data, the autocorrelation at lag 12 would be of particular interest. In this book, 
with occasional exceptions, the first 20 lags will be considered. This choice is 
made entirely on the pragmatic basis that autocorrelations at high lags are 
difficult to interpret. 


9.3 Prediction intervals for 1-step ahead forecasts 


A 1-step ahead forecast is a point estimate: it gives a single forecasted value. As 
with any estimate, some measure of the uncertainty surrounding it is required. In 
the context of forecasts, such a measure of uncertainty is provided by a 
prediction interval. 


A 100(1 — a)% prediction interval for Xn+1, given observed values up to and In keeping with standard 

including zn, is an interval with probability 1 — a of containing X,+1. notation, the Greek letter a is 
used both in connection with 
prediction intervals, and to 
denote the smoothing parameter 
of an exponential smoothing 
method. Which is referred to 
should be clear from the 
context. 
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A prediction interval is superficially similar to a confidence interval, with one 
important difference. A confidence interval relates to a parameter, but a 
prediction interval relates to a random variable, in this case the next value of the 
time series X;. Note that it is not necessary to appeal to plausible ranges or 
repeated sampling of the time series to interpret a prediction interval: the 
definition given here is valid because X,,,; is a random variable, not a fixed 
parameter. 


Example 9.5 Forecasting temperatures 





In Example 7.3, 1-step ahead forecasts of annual average temperatures for Central 
England were discussed. Holt’s exponential smoothing method was applied to the 
data for 1901 to 2004: the optimal parameter values were a = 0.05 and y = 0.36, 
with SSE = 20.32: 


The time plot and the correlogram for the 1-step ahead forecast errors are shown 
in Figure 9.8. 
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Figure 9.8 Forecast errors for annual average temperatures, 1901-2004: (a) time plot (b) correlogram 





The time plot suggests that the forecast errors fluctuate around zero with constant 
variance. The value of the Ljung—Box test statistic for the autocorrelations up to 
lag 20 is 23.23, and the p value is 0.28. This provides little evidence of non-zero 
autocorrelations, a conclusion reinforced by the correlogram. This analysis of the 
forecast errors does not suggest that the forecasts could be improved upon. 





Using this method with the optimal parameter values, the forecasted average 
temperature for 2005 is 10.59°C. How accurate is this forecast, assuming that it is 
valid to extrapolate to 2005? Some indication of the likely accuracy of the 
forecast is provided by the spread of the forecast errors in Figure 9.8(a), which lie 
roughly in the range —1°C to +1°C. Very large fluctuations (in relation to the 
range of the data) would indicate that past forecasts have been inaccurate, and 
hence that the forecast for the next year may also be inaccurate. On the other 
hand, small fluctuations might suggest that, since past forecast errors were small, 
the forecast for the next year is likely to be accurate. 4¢ 
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In Example 9.5, it was suggested that past forecast errors might be used as a 
guide to indicate how accurate the forecast for the next time period might be. 
Specifically, the distribution of past forecast errors can be used to obtain a 
prediction interval for Xn+1.- 





Suppose that the forecast errors F are normally distributed with mean zero and 
variance o°, and that the (population) autocorrelations at all lags k > 1 are zero. 
It follows that En+1, the 1-step ahead forecast error for Xn+1, is normally 
distributed with mean zero and variance 0%. Thus 


P. OE — Tagi — Bt = N (0, a”), 
and hence 


Xap N (Bnei ) 





Let z denote the (1 — a/2)-quantile of the standard normal distribution. Then 
Xn+1 lies in the interval (n41 — 20, 2n41 + zo) with probability 1 — a. To 
obtain a 100(1 — a)% prediction interval, ø is replaced by an estimate o. The 
following estimate based on the sum of squared forecast errors will be used: 


A [SSE 
g =4/ —. 
n 


Thus an approximate 100(1 — a)% prediction interval for Xn+1 is a. 413 at N? 
where the prediction limits, z,,,, and g 41, are given by 


- n SSE A SSE 
Cu 24 ——> Tayi = Ent + 24) a (9.1) 


This prediction interval is approximate because o has been estimated. The SSE, 
and hence a, also depends on the choice of initial values, though their effect may 
be ignored when n is large. 


Example 9.6 Prediction interval for next year’s average temperature 


In Example 7.3, Holt’s exponential smoothing method was used to produce 1-step 
ahead forecasts for the annual average temperature in Central England using data 
for 1901 to 2004. The forecasted average temperature for 2005 was 10.59°C. In 
this example, a 95% prediction interval for the average temperature in 2005 will 
be obtained. 











Before calculating the prediction interval, the following assumptions upon which 
the prediction limits depend must be checked: the forecast errors are normally 
distributed with mean zero and constant variance; and the autocorrelations 
between forecast errors are zero at lags k > 1. 


Figure 9.8(a) suggests that the forecast errors fluctuate around zero, with roughly 
constant variance. In Example 9.5, you saw that Figure 9.8(b) and the Ljung—Box 
test suggest that the autocorrelations are zero — at least until lag 20, and it is 
not likely that temperatures in years more than 20 years apart are correlated. A 
histogram can be used to check that the distribution of the past forecast errors is 
approximately normal: a histogram of the past forecast errors is shown in 

Figure 9.9. 
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Frequency 
20 
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Figure 9.9 A histogram of the forecast errors 


This histogram suggests that the normal assumption is reasonable. ‘Thus all the Formal tests of normality are 
assumptions are satisfied, so a prediction interval for X,,41 (corresponding to the available, but will not be used in 
year 2005), based on the forecast 7,41 = 10.59, can be calculated using (9.1). this book. 


For a 95% prediction interval, a = 0.05, so 1 — a/2 = 0.975 and hence the 

0.975-quantile of the standard normal distribution is required: this is 1.960. The A table of quantiles of the 

SSE is 25.32, and this is based on 104 annual temperatures from 1901 to 2004. standard normal distribution is 
Therefore the prediction limits, x7, and me 41, are given by given in the Handbook. 





n41 — n41 7 Z 


A SSE d 
Tari = Engi + 2y = 10.59 + 1.964 ~ 11.56. 


The results may be summarized as follows. The forecasted average temperature 
for 2005, based on observed temperatures from 1901 to 2004, is about 10.6°C, 
with approximate 95% prediction interval (9.6,11.6). ¢ 








The method for calculating prediction intervals is summarized in the following 
box. 


Prediction interval for a 1-step ahead forecast 


Suppose that a 1-step ahead forecast £n+ı for Xn+ı has been obtained, 
together with SSE, the sum of squared forecast errors at times 1,2,...,n. 
An approximate 100(1 — a)% prediction interval for X,,+1 is given by 


( [SSE yE), 
Ln+1 — Z ET 
n 


where z is the (1 — a/2)-quantile of the standard normal distribution. 








The following assumptions should be checked before the prediction interval 
is calculated. 


© The forecast errors are normally distributed with mean zero and 


constant variance g2. 


© The autocorrelations between the forecast errors are zero at lags k > 1. 


A time series X; for which the X; are normally distributed with mean zero and 
constant variance o7, and for which the autocorrelations at lags k > 1 are all zero, 
is called white noise. Thus the assumptions required to calculate a prediction 
interval are equivalent to requiring that the time series of forecast errors is white 
noise. 
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Activity 9.5 Predicting a chemical concentration 


In Activity 4.2, a time series of the concentration level of a chemical process was 
discussed. The time series consists of 197 successive two-hourly readings at 

2,4,...,394 hours. Simple exponential smoothing is used to obtain 1-step ahead 
forecasts. Figure 9.10 shows the time plot and a histogram of the forecast errors. 
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Figure 9.10 Forecast errors: (a) time plot (b) histogram 


The value of the Ljung—Box test statistic for lags 1 to 20 is 29.16, and the p value 
is 0.085. The SSE is 19.89. 


(a) Discuss the validity of the assumptions required for calculating a prediction 
interval for the concentration at time 396 hours. 


(b) The forecasted value at time 396 hours is %19g = 17.50. Obtain a 95% 
prediction interval for X19g. 


(c) Summarize the results. 


The forecasting methods that have been discussed all depend on the assumption 
that the time series may be described by an additive model. When this is not the 
case, the methods can still be used if a transformation can be found such that the 
transformed time series may be represented by an additive model. The forecasting 
methods can then be applied to the transformed time series to obtain forecasts 
and prediction intervals, and these can be ‘transformed back’ to give forecasts and 
prediction intervals on the scale of the original time series. This idea is illustrated 
in Example 9.7. 





Example 9.7 Visits abroad 


In Activity 1.1, the monthly time series of numbers of (thousands of) visits 
abroad by UK residents for 1980 to 2004 was described. In Example 2.4, it was 
suggested that an additive model may be appropriate to represent the time series 
of square roots of the numbers of visits. 


The Holt—Winters method was used to obtain a forecast for January 2005, based 
on data up to December 2004. The forecasted value, on the square root scale, 
was 62.50, with 95% prediction interval (59.06, 65.94). However, a forecast and 
prediction interval are required on the original scale. 
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Values on the square root scale can be transformed to give values on the original 
scale by taking squares. Thus the forecasted number of thousands of visits abroad 
for January 2005 is 62.50? ~ 3906, and the 95% prediction limits (in thousands) 
are 59.067 ~ 3488 and 65.94? ~ 4348. 


The interpretation of the prediction limits is as follows: the probability that the 
number of visits abroad in January 2005 will lie between 3488 000 and 4 348 000 


is 0.95. ¢ 
The method illustrated in Example 9.7 will work for any monotonic A transformation is monotonic 
transformation and, in particular, for any increasing transformation. Suppose if its graph is either increasing 


that the time series on the original scale is Y, and that an increasing function g is °" decreasing. 


used to transform Y; to give the time series X; = g(Y;). If a forecast 7,41 and a 
prediction interval (7,41, ae 1) are obtained on the transformed scale, then on 
the original scale, the forecast is Yn+1, where 


g(Yn41) = Dts 


and the prediction limits are y,,, and y,",,, where 


I(Yn+1) = Lyi) Fa) = eas 


So the forecast Jn+ı and the prediction limits y;,, and y,,, can be obtained by 
applying the inverse of the function g to 7,41, %,,, and at sos 


This method will work with the logarithm transformation and the power 
transformations used in this course. For example, if the log transformation is used 
to transform a time series, then a forecast and prediction limits on the log scale 
can be transformed back to the original scale using the exponential function. 


Activity 9.6 Predicting the FTSE100 index 


In Example 9.3, the logarithms of the F'TSE100 index values between 
January 1988 and January 2005 were analysed using Holt’s exponential 
smoothing. 





Using this method, the predicted value for February 2005, on the log scale, 

is 8.490. The SSE is 0.3737, calculated using 205 observations to January 2005. 
You may assume that the forecast errors are white noise; that is, they are 
normally distributed with mean zero and constant variance, and the 
autocorrelations are zero at lags k > 1. 


(a) Obtain an approximate 99% prediction interval for the log FTSE100 index 
for February 2005. 


(b) Obtain a forecast and 99% prediction interval for the FTSE100 index for 
February 2005. 


Summary of Section 9 


In this section, the sample autocorrelations of the 1-step ahead forecast errors 
have been defined. You have learned how to use these to investigate the 
performance of a forecasting method. The correlogram and the Ljung—Box test 
for zero autocorrelation have been discussed. The calculation of approximate 
prediction intervals for 1-step ahead forecasts has been described. You have 
learned how to check the assumptions required for these calculations. 





79 


Book2 Time series 


Exercises on Section 9 


Exercise 9.1 Forecast errors for house prices 


The Holt—Winters method was applied to the data on the logarithms of monthly See Activity 7.1. 
average house prices between January 1996 and January 2005. 





(a) The series comprises 109 time points. The sample autocorrelation of the 
l-step ahead forecast errors at lag 12 is rjg = —0.244. Calculate the 
significance bounds, and hence evaluate the evidence against the null 
hypothesis that p} = 0. 





(b) The value of the Ljung—Box test statistic for autocorrelations at lags 1 to 20 
is 26.55, and the p value is 0.148. What do you conclude from this? 


(c) The correlogram for lags 1 to 20 is shown in Figure 9.11. 
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Figure 9.11 Correlogram for forecast errors for log house prices 


Explain why the correlogram does not necessarily contradict the finding of 
part (b). 


Exercise 9.2 Prediction interval for house prices 


This exercise is based on the data and smoothing method described in 
Exercise 9.1. The time plot and a histogram of the 1-step ahead forecast errors 
are shown in Figure 9.12. 
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Figure 9.12 Forecast errors for the logarithms of average house prices: (a) time plot (b) histogram 
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The forecasted value of the logarithm of the average house price for February 2005 
is 11.937. The SSE, calculated using 109 values, is 0.006 21. 


(a) You may assume that the autocorrelations at lags k > 1 are zero (see 
Exercise 9.1). Use Figure 9.12 to check the remaining assumptions required 
to use (9.1) to calculate a prediction interval for the February 2005 value. 


(b) Calculate an approximate 95% prediction interval for X110, the logarithm of 
the average house price in February 2005. 





(c) Obtain a forecast and 95% prediction interval for the average house price in 


February 2005, and summarize your results. Give your answers to the nearest 
£100. 





10 Autocorrelation and model checking in SPSS 


In this section, you will learn how to use SPSS to obtain the correlogram for the 
1-step ahead forecast errors, to carry out the Ljung—Box test, and to check 
normality assumptions. 


Refer to Chapter 6 of Computer Book 2 for the work in this section. 





Summary of Section 10 


In this section, you have learned how to use SPSS to obtain the correlogram for 
the 1-step ahead forecast errors, and to carry out the Ljung—Box test. You have 
also learned how to superimpose a normal curve on a histogram to check 
normality. 
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Introduction to Part III 


In Parts I and II, moving averages were used to estimate the components of a 
time series X; that can be described adequately by an additive decomposition 
model, and to obtain forecasts using exponential smoothing. The additive 
decomposition model specifies how the trend, the seasonal component and the 
irregular component are combined, but otherwise makes no assumptions about 
the correlations between successive values of X;. Avoiding any such assumptions 
is both a strength and a weakness. It is a strength because the forecasting 
methods described will work whatever the underlying correlation structure of the 
time series. But it is also a weakness, since the correlation structure can help 
throw light on the process generating the data, and can be used to produce more 
accurate forecasts. 





In Section 9, you learned how to obtain prediction intervals for 1-step ahead 
forecasts obtained by exponential smoothing. These prediction intervals require 
that the 1-step ahead forecast errors are uncorrelated, and normally distributed. 
But if they are not, then more general methods of analysis are required. 


In Part III, the additive decomposition model is extended to include a statistical 
model for the irregular component W; that allows explicitly for non-zero 
autocorrelations. A family of models known as integrated autoregressive moving 
average models, or ARIMA models, is discussed. These models were popularized 
in the 1960s by George Box and Gwilym Jenkins, and for this reason are 
sometimes also referred to as Box—Jenkins models. 


ARIMA modelling of time series involves some advanced mathematics. All of the 

more difficult mathematics will be sidestepped so as to concentrate on the 

practical aspects of the models. To keep matters simple, only non-seasonal time ARIMA models can be extended 
series will be considered. In Section 11, an important family of time series, the to cope with seasonality, but 
stationary series, is introduced. In Section 12, a class of models known as these extensions will be omitted. 
autoregressive models is discussed. In Section 13, a further class of models, the 

moving average models, is described. In Section 14, autoregressive models and 

moving average models are brought together as ARIMA models. Finally, in 

Section 15, you will learn how to use SPSS to analyse time series using ARIMA 

models. 
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In this section, the family of stationary time series is introduced. The models 
described in subsequent sections apply to stationary time series. Thus, if a time 
series is not stationary, the first step in choosing a model for it will be to 
transform it into a stationary series. 


In Subsection 11.1, stationary time series are defined and illustrated. Then, in 
Subsection 11.2, methods for transforming time series into stationary time series 
are discussed. One method of particular importance is called differencing. In 
Subsection 11.3, you will learn how to use SPSS to obtain stationary series. 


11.1 Stationarity 


In general terms, a time series is said to be stationary if its basic statistical 
properties do not vary over time. Stationarity is an important idea in time series 
analysis. It is important because it provides a basis for forecasting: if the 
statistical properties of the time series do not change, then there is some chance of 
obtaining good forecasts by extrapolation (though this is never guaranteed). 


The most basic of statistical properties are the mean and variance. A time series 
X+ is said to be stationary in mean if it has constant mean: 


E( Xt) = p. 
In particular, this implies that there is no increasing or decreasing trend, and no 
seasonality (or any other cyclic variation). 
A time series is said to be stationary in variance if it has constant variance: 
V(X) = 0". 


This means that the size of the irregular fluctuations must be roughly the same at 
every time point. Note that a time series that is stationary both in mean and in 
variance can be written in the additive form 


X,= p+ Wi, 


where the irregular component W; has mean zero and variance o7. 
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Example 11.1 Stationarity in mean and in variance 


The time plots of four time series are shown in Figure 11.1. 
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Figure 11.1 Four time series 


The time series in Figure 11.1(a) displays a trend, though the variance of the 
irregular component appears constant: this time series is stationary in variance, 
but it is not stationary in mean. Figure 11.1(b) shows a time series with no trend, 
but decreasing variance: this time series is stationary in mean but it is not 
stationary in variance. Figure 11.1(c) shows a time series with increasing trend 
and variance: this time series is stationary neither in mean nor in variance. In 
fact, it can be described by a multiplicative model. Figure 11.1(d) shows a time 
series with no trend and constant variance: this time series is stationary in both 
mean and variance. 4 


Deciding whether a time series is stationary in mean and in variance is done most 
easily by examining a time plot. In fact, you have already done this: in 
Subsection 1.2, you examined time plots to identify trends, and in Subsection 9.3, 
to check that the variance of the 1-step ahead forecast errors is constant. 


A further type of stationarity is important in time series analysis. This is the 
requirement that the autocorrelation between X; and X;_, does not vary with t, 
but depends only on the lag k. This characteristic is called stationarity in 
correlation. Stationarity in correlation is required to define the autocorrelation 
function p;, which is the underlying autocorrelation between X+, and X-k. 


In Subsection 9.2, it was assumed that the autocorrelation between the 1-step 
ahead forecast errors at times t and t — k depended only on the lag k. In other 
words, it was assumed that the forecast errors were stationary in correlation. 


Stationarity in correlation cannot usually be checked by inspecting a time plot. 
For this reason, it is usually assumed that a time series is stationary in 
correlation, unless there is good reason to believe it is not — for instance, because 
of a change in the process generating the data. 
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A time series is said to be stationary if it is stationary in mean, in variance and There are other definitions of 





in correlation. This definition is set out in the following box. stationarity. The one given here 
is sometimes called weak 
stationarity. 
Stationarity 


A time series X; is stationary if it satisfies the following conditions. 
© E(X,;) =p (constant mean). 
© V(X;)=o7 (constant variance). 


© For all k, p,, the autocorrelation between X; and X;_,, depends only 
on the lag k. 


Example 11.2 White noise 


In Subsection 9.3, the term white noise was defined: a time series X+ is said to be 
white noise if X; ~ N(0,o7) and successive terms are uncorrelated. It follows that 
for white noise, the autocorrelations p, are such that p, = 0 for k > 1. White 
noise is therefore a stationary time series. @ 





Non-stationarity induces patterns in the autocorrelations that make them even 
more difficult interpret. This is illustrated in Example 11.3. 


Example 11.3 Non-stationarity and autocorrelations 


Figure 11.1(a) was obtained by adding a trend component to the time series shown 
in Figure 11.1(d). Figure 11.2 shows the effect of this trend on the correlogram. 
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Figure 11.2 Correlograms for two time series 


Figure 11.2(a) shows the correlogram for the time series in Figure 11.1(d). There 
is a relatively large positive sample autocorrelation at lag 1 — it exceeds the 
upper significance bound — but the other sample autocorrelations lie within or 
only just cross the significance bounds: it is reasonable to conclude that 
autocorrelations at lags greater than 1 are zero. Figure 11.2(b) shows the 
correlogram for the time series in Figure 11.1(a). 





Adding a trend to the time series changes the sample ACF completely: now all Recall that ACF is an 

sample autocorrelations up to lag 20 (and well beyond) are large and positive, and abbreviation for autocorrelation 
exceed the significance bounds. The autocorrelations are large and positive function. 

because, owing to the trend, a small value x; tends to be preceded by small values 

at times before t, while a large value x; tends to be followed by large values at 

times after t. ¢ 
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It is generally impossible to interpret the correlogram for a non-stationary time 
series in any useful way. For this reason, correlograms should be used primarily to 
investigate stationary time series. Activity 11.1 provides another example of the 


effect of non-stationarity on the correlogram. 


Activity 11.1 Correlograms and seasonality 


The time plots of two monthly time series are shown in Figures 11.3(a) 
and 11.3(b). The correlograms corresponding to the time series in Figures 11.3(a) 
and 11.3(b) are shown in Figures 11.3(c) and 11.3(d), respectively. 
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Figure 11.5 Two time plots and corresponding correlograms 


(a) Use the time plots in Figures 11.3(a) and 11.3(b) to decide whether each of 
the time series is stationary or non-stationary. Explain your decision in each 


Case. 


(b) The time series in Figure 11.3(b) was obtained from that in Figure 11.3(a) by 
adding a component to it. In general terms, what sort of component do you 


think was added? 





(c) Describe the main difference between the two correlograms. Explain in 
general terms why the transformation you identified in part (b) has produced 
the large positive sample autocorrelations at lags 12 and 24, and the large 
negative sample autocorrelations at lags 6 and 18, shown in Figure 11.3(d). 
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11.2 Differencing time series 





Suppose that a non-seasonal time series X; can be described using an additive Throughout the rest of Part III, 
model of the form only non-seasonal time series are 
considered. 
Xi = m, + Wi, 


where m; is the trend component and W; is the irregular component with mean 
zero and constant variance. A standard method of analysis is to estimate the 
trend component as described in Part I, then subtract the estimated trend from 
the series, and thus obtain an estimate of the irregular component. Provided that 
the trend has been fully removed, the estimated irregular component can be 
assumed to be stationary, and hence its correlation structure can be investigated 
using a correlogram. 


A second approach is to remove the trend without estimating it explicitly. This is 
achieved by differencing the series, as follows. Suppose first that the trend 
component is linear, so that for some constants m and b, 


m =m+0bxt. 








The idea behind differencing is to replace X; by Y;, the difference between X; 
and X1: 
Yi = Xe — Xt-1 

= (m+ bt + Wi) — (m+ b(t — 1) + W1) 

= (m+ bt + W:) — (m+ bt + Wi_1 — b) 

= b+ W: — Wi-1 

=b + W;, 
where W; = W; — W;:_1. Note that the time series Y; has constant level (since b is 
a constant) and irregular component W;. The new time series Y; is called the 


series of first differences of X;, or alternatively the time series of differences 
of order 1. 


The time series of first differences is obtained because, although the original time 
series X+ is not stationary in mean, the series of first differences is stationary in 
mean. 


Example 11.4 First differences 


The time series of monthly percentage yields on British Government securities See Example 9.1. 
between 1950 and 1970 is reproduced in Figure 11.4(a). 
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Figure 11.4 Monthly yields: (a) original data (b) first differences 
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The original time series is known to be non-seasonal, but it has a clear rising 
linear trend. ‘Table 11.1 shows the first few values of the time series and the 
corresponding first differences 


Table 11.1 Monthly yields and first differences, 
January 1950—June 1950 


Month Yield First differences 
January 222 — 
February 2:23 0.01 
March 2.22 —0.01 
April 2.20 —0.02 
May 2.09 —0.11 

June 1.97 —0.12 


The first difference for January 1950 cannot be calculated from these data, since 
the value for December 1949 is not available. So the corresponding cell is left 
blank in Table 11.1. The first difference for February 1950 is 


y2 = 2—11 
= 2.23 — 2.22 
= 0.01. 


The other first differences are calculated similarly. The time series of first 
differences is shown in Figure 11.4(b). The increasing trend has been removed, 
and the resulting time series can be assumed to be stationary. ¢ 


So far it has been assumed that the original series X+, has a linear trend. But 
what happens if the trend is curved? In that case, the series of first differences 
might not be stationary in mean, but it will be ‘less curved’ than the original 
series. So the procedure is repeated. A third series, Z;, is obtained by taking the 
first differences of the Y;: 


Lip Ly Lp 
= (Xi — Xy_-1) — (Xt-1 — Xt-2) 
= Ap =LA ids 
This time series is called the series of second differences of X;, or the time 
series of differences of order 2. (Note that the series of second differences is not 


the same as the series X; — X;_2.) If the original time series X; has a quadratic 
trend component, 


Mmi = m + bt + at?, 


then it can be shown that the series of second differences will be stationary in 
mean. The key point is that you can keep taking differences in this way until you 
obtain a series that is stationary in mean, and that you can do so without 
estimating the trend component of the original time series. 
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In fact, this series of first 
differences was used as the basis 
for Figures 11.1 and 11.3. 


Section 11 Stationary time series 


Example 11.5 The UK index of production 


The time plot of the (seasonally adjusted) quarterly UK index of production, 
between the first quarter of 1990 and the first quarter of 2005, is shown in 


Figure 11.5. These data were obtained in 
June 2005 from the National 


Statistics website 


www.statistics.gov.uk. 
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Figure 11.5 UK index of production 


The series is non-seasonal, since the seasonal component has been removed by 
seasonal adjustment. The irregular fluctuations do not vary in size, so the series 
may be described using an additive model. However, the series is not stationary, 
as there is clear variation in the level of the series over time. An initial drop 
(between 1990 and 1992) is followed by a rise until about 2001, then followed by 
another drop. The trend is certainly not linear. 


The time plot of the series of first differences is shown in Figure 11.6(a). 
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Figure 11.6 Index of production: (a) first differences (b) second differences 


This time plot suggests that the series of first differences may not be stationary in 
mean: after the first value, there is an increasing trend until 1993, followed by a 
more gradual decline. However, the variation in the level is much less marked 
than for the original data. Thus the first differences have reduced, but not 
completely removed, the trend. 


The time plot of the series of second differences is shown in Figure 11.6(b). This 
time series is clearly stationary in mean. There is no systematic increase or 
decrease in the variance, so the series also appears to be stationary in 

variance. @ 
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Activity 11.2 Obtaining a differenced time series 
Table 11.2 contains the first six values of the time series of the UK index of 
production discussed in Example 11.5. 


Obtain the first and the second differences corresponding to these values. 


In Example 11.5, you saw that stationarity in mean can be obtained by repeated 
differencing of the time series. However, note that you should not difference a 
time series more times than is necessary to obtain approximate stationarity. Once 
approximate stationarity has been achieved, further differencing is unnecessary 
and may make it more difficult to model the time series. Differencing too much is 
called over-differencing, and should be avoided. It is seldom necessary to use 
differencing of order greater than 2. 


Activity 11.3 Differencing the Central England temperatures 


The time plot of the annual average Central England temperatures for 1901 to 
2004 is shown in Figure 11.7. 
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Figure 11.7 Annual average temperatures in Central England, 1901—2004 


The time plots of the first differences and the second differences are shown in 
Figures 11.8(a) and 11.8(b), respectively. 
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Table 11.2 Index of 


production 
Period 


Quarter 1, 1990 
Quarter 2, 1990 
Quarter 3, 1990 
Quarter 4, 1990 
Quarter 1, 1991 
Quarter 2, 1991 


Section 11 Stationary time series 


First difference Second difference 
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(a) (b) 


Figure 11.8 Central England temperatures: (a) first differences (b) second differences 
(a) Briefly describe the time series in Figure 11.7, and the effects of taking first 
differences and second differences. 


(b) Which order of differencing is appropriate to achieve approximate 
stationarity in mean for the time series of Central England temperatures? 


(c) Explain why the time plot corresponding to your answer to part (b) suggests 
that the differenced series is stationary. 





11.3 Differencing in SPSS 


In this subsection, you will learn how to use SPSS to produce time plots of first 
differences and differences of higher order so as to determine the order of 
differencing required to produce a time series that is stationary in mean. You will 
also learn how to transform a time series using logarithms and obtain time plots 
of differences of the transformed time series. 


Refer to Chapter 7 of Computer Book 2 for the work in this 
subsection. 





Summary of Section 11 





In this section, stationarity of a time series has been defined. You have learned 
how to recognize non-stationarity in mean and in variance from a time plot. The 
method of differencing has been introduced. You have learned how to use SPSS to 
difference a time series and, if necessary, transform it using logarithms in order to 
obtain approximate stationarity. 
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Exercises on Section 11 


Exercise 11.1 Stationarity in mean and variance 


The time plot of a non-seasonal time series is shown in Figure 11.9. 
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Figure 11.9 A non-seasonal time series 


Is the time series stationary in mean? Is it stationary in variance? Explain your 
answers. 


Exercise 11.2 Differencing a series 
fje g Table 11.3 The original 


The time series in Figure 11.9 was obtained by differencing. Table 11.3 contains time series 


the first six values of the original (undifferenced) time series. Tae Value 
1790.8 
1768.8 
1742.5 
1802.2 
1784.4 
1857.6 


Obtain the second differences corresponding to these values. 


OTA WUN FR 
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12 Autoregressive models 


In this section, an important family of models for stationary time series, the In this section, all time series 
autoregressive models, is introduced. Autoregressive models are classified by their are assumed to be stationary. 
order. In Subsection 12.1, the autoregressive model of order 1 is defined. The 

definition is extended to autoregressive models of arbitrary order p in 

Subsection 12.2. In Subsection 12.3, a function called the partial autocorrelation 

function is introduced. In Subsection 12.4, identifying the order of an 

autoregressive model is discussed. 


12.1 The autoregressive model of order 1 


Let X; be a stationary time series, with zero mean. The simplest model 

describing the correlation between successive terms of X+ is the white noise White noise was discussed in 
model, for which the autocorrelations are zero at all lags k > 1. However, the Example 11.2. 

white noise model is very restrictive. For example, it cannot describe processes for 

which the past history of the process influences its future course. ‘This is 

illustrated in Example 12.1. 


Example 12.1 Daily sales of a dairy product 


Figure 12.1 shows the time plot of daily sales of a dairy product over a 100-day DeLurgio, S.A. (1998) 


period. Forecasting Principles and 

Applications. McGraw-Hill, 
Singapore. 

Sales 
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200 
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0 30 100 
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Figure 12.1 Daily sales of a dairy product 


There is no clear trend in the time plot: the values appear to fluctuate around a 
mean of about 199. The size of the fluctuations also appears to be roughly 
constant over the time period. Thus there is no reason to suggest that the time 
series is not stationary. 
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To investigate possible correlations between sales at different time intervals, the 
correlogram is used. The correlogram is shown in Figure 12.2, together with 95% 
significance bounds. 
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Figure 12.2 Correlogram for daily sales of a dairy product 


The sample autocorrelations at lags 1, 2 and 3 clearly exceed the 95% significance 
bounds. Thus the correlogram indicates that the white noise model is 
inappropriate for this time series: sales on adjacent days are correlated. 4 


In many situations it is appropriate to allow for the possibility that X, might 
depend on previous values. A simple model allowing for this is 


The symbol 8 is the Greek 
Xp = PAi + Z, p 


lower-case letter beta. 





where 8 is a constant and Z; is white noise with mean zero and variance ao”. Note 
that if G = 0, then X; is white noise. But if 6 Æ 0, then there is a non-zero 
correlation between X; and X;_,. In order to ensure that X+ is stationary, it is 
necessary to impose the condition —1 < 8 < 1: if this condition were not satisfied, 
then the values of X, would tend to increase in magnitude. 


This model is the autoregressive model of order 1, also called the AR(1) 
model. The model may be extended to include time series X; with mean u Æ 0, as 
in the following box. 





Autoregressive model of order 1 


Let X, be a stationary time series with mean u. The autoregressive 
model of order 1, or AR(1) model, has the following form: 


Xe i = OG = fl) == Ze 


where ( is a parameter to be estimated, —1 < 8 < 1, and Z; is white noise 


with mean 0 and variance o”. 


The word ‘regressive’ is derived from ‘regression’, which describes any relationship 
between random variables of the form Y = GX + Z, and ‘auto’ indicates that the 

X and the Y are successive terms of the same time series. The above model is of 

order 1 because X; depends directly only on its immediate predecessor, X;_1. 





How can we decide whether the AR(1) model is appropriate for a particular time 

series (such as the daily sales of a dairy product discussed in Example 12.1)? One 

approach is to compare the sample autocorrelation function (or ACF) for the time The ACF was introduced in 
series with the theoretical autocorrelation function for the model. If the two are Subsection 9.2. 

similar, the model might be appropriate. If they are very different, then the 

model is probably not appropriate. 
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The autocorrelation function for the AR(1) model is given by 
p, = 8", &=0,1,2,.... (12.1) 


Note that this is the theoretical autocorrelation function corresponding to the 
AR(1) model, as distinct from the sample autocorrelation function displayed in a 
correlogram. Since —1 < 8 < 1, the magnitude of p, gradually tails off from 

Po = 1 as the lag k increases. If G > 0, then all the autocorrelations are positive 
(but eventually become very close to zero). If 8 < 0, then the signs of the 
autocorrelations alternate, positive for even lags and negative for odd lags. 
Figure 12.3 shows the autocorrelation functions at lags 1 to 10 for four AR(1) 
models with different values of (. 
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Figure 12.3 The ACFs for four AR(1) models 


Note that the autocorrelations decline in magnitude more rapidly when 8 = 0.5 
than when 8 = 0.7. The decline in both cases is said to be exponential. 


Example 12.2 Daily sales of a dairy product, continued 


Look again at the correlogram for the daily sales of a dairy product in Figure 12.2. 
There is clear evidence of positive autocorrelation up to lag 3 or 4. Thereafter, 
the autocorrelations lie close to or within the 95% significance bounds and should 
not be over-interpreted, though they display a striking pattern. If an AR(1) 
model were appropriate for these data, the parameter 8 would be positive, since 
otherwise the autocorrelations would alternate in sign. Using Figure 12.3 as a very 
rough guide, a value of 8 in excess of 0.5 might be appropriate, since the decline 
in the autocorrelations as the lag increases is not quite as rapid as when (@ = 0.5. 





On the basis of these observations, an AR(1) model appears to be a possible 
model for these data. Other possibilities must be considered, and other checks 
must be undertaken before it can be concluded that the AR(1) model is 
appropriate. However, this example provides a flavour of the methods used to 
select an appropriate model. 4 
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Activity 12.1 Chemical process data 


Figure 12.4 shows the time plot of 70 successive readings from a batch chemical 
process, and the correlogram (with 95% significance bounds). 


O’Donovan, T.M. (1983) Short 
Term Forecasting: An 
Introduction to the Box—Jenkins 
Approach. John Wiley & Sons, 
Chichester. 








Reading Autocorrelation 
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(a) (b) 


Figure 12.4 Chemical process readings: (a) time plot (b) correlogram 

(a) Is this time series stationary? Explain your answer. (You may assume that 
the series is non-seasonal. ) 

(b) Describe the main features of the sample autocorrelation function. 


(c) If an AR(1) model were appropriate for these data, would the coefficient 8 be 
positive or negative? 


12.2 The autoregressive model of order p 


In the autoregressive model of order 1, X; depends linearly on X;_1, but is not 
directly related to earlier terms of the time series. In many practical settings, it is 
natural to assume that X, might be directly related not only to X;_;, but also to 
X+—2 and perhaps earlier terms X;~3, X;—4, and so on. 








Suppose that X; is a stationary time series with mean zero. A simple model that 
allows for the possibility that X+ is directly related to both X;_; and X;_2 is 


Xt = 8, Xt-1 + PoXt-2 + Ze, 


where 3, and (@, are constants, and Z+ is white noise with mean 0 and 
variance o°. This is the autoregressive model of order 2, also written AR(2). 
If X; has mean u # 0, then the AR(2) model is 


X= w= By (Xt-1 — w) + Bo (Xt-2 — wb) + Ze. 


In order to ensure that X; is stationary, conditions must be imposed on the 
constants 3, and 8»: if both constants are too large in magnitude, then successive 
terms X; will grow in magnitude, so X; will not be stationary. ‘The conditions 
required are more complicated than for the AR(1) model and will not be given 
here. 
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Example 12.3 Viscosity data 


O’Donovan, T.M. (1983) Short 


One hundred successive measurements were made of the viscosity of a chemical a 
Term Forecasting: An 





product. Figure 12.5 shows the time plot and correlogram for this time series. Tiroduct onto the Bows Jenkins 

Approach. John Wiley & Sons, 
Chichester. 
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Figure 12.5 Viscosity time series: (a) time plot (b) correlogram 


The time plot in Figure 12.5(a) suggests that the time series is stationary both in 
mean and in variance (or at least there is no compelling reason not to assume 
this). The correlogram shows an alternating pattern. The pattern is reminiscent 
of the sample ACF for an AR(1) time series with negative coefficient, with some 
important differences. For example, the large positive autocorrelation at lag 1 is 
followed not by one negative autocorrelation, but by several. These are followed 
by two positive autocorrelations at lags 6 and 7. The autocorrelations then dip 
below the 95% significance bounds. 


For reasons that will be explained in Subsection 12.4, an appropriate model for 
this time series is the AR(2) model. @ 


The theoretical autocorrelation function for an AR(2) model is more complicated 
than that for an AR(1) model. However, for all autoregressive models, the 
autocorrelation function either declines exponentially, or alternates in positive and 
negative clumps that tail off in height as the lag increases (this pattern is called 
damped sinusoidal). 











The autoregressive model can be extended to include direct dependence between 
X and X4~1, Xt_2,..., Xt—p for some arbitrary integer p > 1. The definition is 
given in the following box. 


Autoregressive model of order p 


Let X, be a stationary time series with mean u. The autoregressive 
model of order p, or AR(p) model, has the following form: 


Xe — p= By (Xe-1 — o) + Bg (Xt-2 — pw) +--- +B, a — w) + Ze, 


where 01, (32,-.-,(, are parameters to be estimated, and Z; is white noise 


with mean 0 and variance o”. 
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Activity 12.2 The order of an autoregressive model 


In each of the models below, X+ is a stationary time series with mean zero, and Z; 
is white noise. Say whether each model is an autoregressive model, giving a 
reason for your answer in each case. If it is an autoregressive model, state its 


order and write down the values of the parameters (,,... 


(a) X: =0.5X1 + 0.3X2 + Z: 

(b) X: = 0.6Xi—1 — 0.2X:—2 — 0.05X:-3 + Ze 
(c) X,=0.5X7_,4+0.3X¢-24+ % 

(da) Xi = —0.6X;_1+0.1X%;2+ Ze 


, Bp 


12.3 The partial autocorrelation function 


In practice, it is not usually possible to identify the order of an autoregressive 
model just by examining its autocorrelation function. This is illustrated in 


Example 12.4. 


Example 12.4 Autocorrelation functions for autoregressive models 


Figure 12.6 shows the theoretical autocorrelation functions at lags 1 to 15 for four 


autoregressive time series models. 
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Figure 12.6 The ACFs for four autoregressive models 
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Section 12 Autoregressive models 


Figure 12.6(a) shows the ACF for the AR(2) model 
Xı = 1.2X4_-1 — 0.85X4_2 + Z, and is very similar to Figure 12.6(b), which shows 
the ACF for the AR(1) model X; = 0.8X¢_1 + Ze. 


Similarly, Figure 12.6(c) shows the ACF for the AR(2) model 
Xi = —1.2X4_1 — 0.85X4_2 + Z, which looks much the same as Figure 12.6(d), 
which is the ACF for the AR(1) model X; = —0.8X4_1 + Z%. 


For each pair of series, there are subtle differences in the values of the 
autocorrelations. But it is not easy to tell which is which by inspecting these 
plots. Furthermore, these are plots of the theoretical ACF: plots based on data 
(correlograms) are less regular and harder to interpret. @ 


Example 12.4 shows that the ACF may not be of much help in identifying the 
order of an autoregressive time series. However, a function called the partial 
autocorrelation function, or PACF, can make the identification easier. 





The idea behind the PACF will be explained in the context of autoregressive 
models. However, note that the PACF is defined for all stationary time series, not 
just for autoregressive models. 


Consider the AR(1) model 
Xi = bXi1 + Z. 
The same model, applied at time t — 1, gives 


Xt-1 = BX¢-2 + Zt-1. 





The model thus specifies a direct dependence between X; and its predecessor 
X;+—1, and a direct dependence between X;_; and X;_2. These successive direct 
dependencies induce an indirect dependence between X; and X;~2, which is 
manifested as a non-zero autocorrelation 8° at lag 2. The key point, however, is 
that the dependence between X, and X;_2 is indirect, and works through X;_: 
all the dependence between X; and X;_2 is accounted for by the correlation 
between X; and X;_; and the correlation between X;_, and X;_9. There is no 
direct dependence between X; and X;_2. The partial autocorrelation between X; 
and X;_2 is a measure of the direct dependence between X; and X;_2, so for the 
AR(1) model it is zero. 


























The idea of direct and indirect dependence between the terms in a time series can 
be used to define the partial autocorrelation between any two terms in any time 
series. In general, for any time series X;, the partial autocorrelation between 
X and X;_, is a measure of the dependence between X; and X;_, that is not 
accounted for by correlations with the intermediate values 

X+-1, Xt—2,..., Xt—-k+1. Thus it is a measure of the direct dependence between 
X and X;_,. Like an ordinary autocorrelation, a partial autocorrelation is a 
number a between —1 and 1. The partial autocorrelation between X; and X;_, is 
zero if there is no direct dependence between them. 











For a stationary time series X;, the partial autocorrelations depend only on the 
lags. The partial autocorrelation function or PACF is defined as follows: 


a, = partial autocorrelation between X; and X;_,, k=0,1,... . The expression for a, has been 
a omitted. However, you will be 
Since X; is perfectly and directly correlated with itself, ag = 1. Also, since there expected to interpret partial 
are no intermediate terms between X, and X;_1, the correlation between them autocorrelations. 


must be direct, so a; = pı. However, for lags greater than 1, a, and p, usually 
differ. The interpretation of a, is similar to that of p}: the key point to remember 
is that az relates only to the extent of direct dependence at lag k, that is, to 
dependence not accounted for by correlations with intermediate values. 
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The autoregressive model of order p for a time series X; with zero mean is 
Xt = By Xt-1 + BoXt-2 + +++ + Gy Xt-p + Zt. 


This specifies direct dependencies between X+ and each of X¢_1, Xz~2,..., Xt~p. 
There is no direct dependence between terms separated by lags greater than p, 
though indirect dependence between them will generally be induced by the 
intermediate terms. The lack of any direct dependence at lags greater than p 
means that the partial autocorrelations at lags greater than p are all zero. In fact, 
it can be shown that for an AR(p) model, 














Qp = p and az =0 fork >p. 


The fact that the PACF for an AR(p) model is zero at all lags greater than p, and 
is non-zero at lag p, can be used to identify the order of an autoregressive model. 
This is illustrated in Example 12.5. 


Example 12.5 The PACF for an autoregressive model 


The ACFs for four autoregressive models were shown in Figure 12.6. Figure 12.7 
shows the theoretical partial autocorrelation functions, or PACFs, for these 


models. 
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Figure 12.7 The partial autocorrelation functions for four autoregressive models 





The PACFSs in Figures 12.7(a) and 12.7(c) are zero after lag 2: it can therefore be 
concluded that these autoregressive models are of order 2. In contrast, the PACF's 
in Figures 12.7(b) and 12.7(d) are zero after lag 1: these autoregressive models 
are therefore of order 1. ¢ẹ 
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Activity 12.3 Interpreting the PACF 


Figure 12.8 shows the PACFs for two AR(p) models. 
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Figure 12.8 The PACFs for two AR(p) models 


In each case, identify the value of p, and obtain a rough estimate of 6,. 


12.4 Identifying the order of an autoregressive 
model 


The partial autocorrelations can be estimated from a time series £1, £2,..., Zn, 

giving rise to the sample PACF: for each k, the sample partial autocorrelation 
Qk is an estimate of the partial autocorrelation a;. Details of how the PACF is 
estimated are omitted: calculation of the sample PACF is done by computer. 





If the underlying model were the white noise model, then all the partial 
autocorrelations would be zero. Under the null hypothesis of zero partial 
autocorrelation, the distributions of the sample partial autocorrelations for a time 
series with n observations are approximately N(0,1/n). Thus a sample partial 
autocorrelation @, greater than +1.96/,/n or less than —1.96/y/n may be 
interpreted as providing at least moderate evidence against the null hypothesis 
Ak = 0. 








The sample ACF is represented as a bar chart (the correlogram), and the sample 
PACF is also represented as a bar chart. The bar chart for the sample PACF is 
called the partial correlogram, or sample PACF plot. Only sample partial 
autocorrelations at lags 1 to 20 will usually be shown. Significance bounds are 
often drawn as horizontal lines at —1.96/,/n and +1.96/,/n on the partial 
correlogram. These provide a guide for deciding which of the underlying partial 
autocorrelations are non-zero. 





ay 


Lag 





The interpretation of the sample 
PACF is similar to that of the 
sample ACF, which was 
described in Subsection 9.1. 
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Example 12.6 Sample PACF for the viscosity data 
In Example 12.3, a stationary time series of viscosity measurements was discussed. 


The partial correlogram for this time series for lags 1 to 20 is shown in Figure 12.9. 
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Figure 12.9 Partial correlogram for the viscosity data 


The sample partial autocorrelations at lags 1 and 2 are large in absolute value, 
and exceed the significance bounds. The sample partial autocorrelations at higher 
lags are smaller, and lie within the significance bounds. 


The partial correlogram suggests that the partial autocorrelations at lags 1 and 2 
are non-zero, but provides little evidence that the partial correlations at higher 
lags are non-zero. Accordingly it is reasonable to conclude that ag Æ 0 and ay, = 0 
for k > 2. Thus an AR(2) model might be appropriate for this time series. ¢ẹ 








Note the use of the word ‘might’ in the last sentence of Example 12.6: it is 
important to keep an open mind about what models may be appropriate. In 
particular, models other than an AR(2) model might fit the data equally well. 
Nevertheless, the partial correlogram provides a powerful tool in choosing a 
model. Activity 12.4 will give you some practice at using the partial correlogram 
in this way. 
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Activity 12.4 Choosing an autoregressive model for the dairy sales 
time series 


In Example 12.1, a time series of daily sales of a dairy product was discussed. The 
time series appeared to be stationary, and its correlogram indicated marked 
departure from white noise. Figure 12.10 shows the partial correlogram for this 
time series. 
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Figure 12.10 Partial correlogram for the dairy sales time series 


The time series is to be modelled using an AR(p) model. Suggest a suitable value 


of p. Explain your choice. 


The following box summarizes the key features of autoregressive models. 


Autoregressive models 


Let X+ be a stationary time series with mean u. The autoregressive 
model of order p, or AR(p) model, where p is a positive integer, has the 
following form: 


GE E OG ete 2, 


where (,,...,(, are parameters to be estimated, and Z is white noise with 
mean 0 and variance o°. 


The ACF for an AR(1) model is given by p, = 37 for k > 0. The ACF for 
an AR(p) model tails off exponentially in damped sinusoidal fashion with 
increasing lag. 


The PACF for an AR(p) model satisfies a, = 8p, and ag = 0 for lags k > p. 
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Summary of Section 12 


In this section, autoregressive models for stationary time series have been 
introduced. The properties of autoregressive models of order 1 and of arbitrary 
order p have been described. The partial autocorrelation function PACF has been 
defined. The characteristics of the PACF for autoregressive models have been 
outlined. You have learned how to use the partial correlogram to choose the order 
of an autoregressive model. 





Exercises on Section 12 


Exercise 12.1 Autoregressive or not? 


For each of the following time series models, state whether or not it is an 
autoregressive model. If your answer is no, explain why it is not an autoregressive 
model. If your answer is yes, state the order of the model. In each case, X; is 
stationary with mean zero, and Z; represents white noise. 


(a) oE = 0.9.X;_4 a 0.5X4_9 ae Lt 
(b) Xi = —0.6.X4_1 3 0.2.X;_9 = O0.3.X;_1X4_9 a Zt 
(c) Xt41 = —0.2 X; ae Zt+1 


Exercise 12.2 Chemical process readings 


In Activity 12.1, a time series of readings from a chemical process was described. 
The partial correlogram for this time series is shown in Figure 12.11. 


Partial autocorrelation 
I 








-1 
12345 6 7 8 Y 1011 1213 14 15 16 17 18 19 20 
Lag 


Figure 12.11 Partial correlogram for the chemical process data 


(a) The time series is to be modelled using an autoregressive model. In your 
view, is this appropriate? If so, what is the order of the model? Give reasons 
for your answers. 


(b) Assume that an AR(1) model is suitable. Use Figure 12.11 to obtain a rough 
estimate of the parameter 64. 
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Autoregressive models are used to represent long-term dependence. For example, 
the autocorrelation function for an AR(1) model is given by p, = 8” for k > 0. 
For large lags, the autocorrelation is small, but non-zero, so a value observed a 
long time ago still exerts some influence over current and future values. 


In this section, a family of models for stationary time series that can be used to 
represent short-term dependence is introduced. These are the moving average 
models. Like autoregressive models, moving average models are characterized by 
their order. In Subsection 13.1, the moving average model of order 1 is introduced. 
In Subsection 13.2, more general moving average models are described, and a 
method for choosing an appropriate moving average model is discussed. 








13.1 The moving average model of order 1 


Let X; denote a stationary time series with zero mean. A model is required that 
can be used to represent short-term dependence between successive terms. 
Consider first the white noise model 


Ai _ Zi 





where Z; is a stationary time series of uncorrelated terms with mean zero and 
variance o?. The ACF for this model is zero at all lags k > 1, so for this model, 
there is no dependence at all between terms in the time series. Now consider the 
model 








Xi = Zt — O Zia, You may wonder why —0Z;_1 is 
, , , , , used rather than +6Z;_1: this is 
where 0 is a constant. Applying this model at time t — 1 yields just a convention. 


Xt-1 = Zy_1 — OZ,_2. 


Since Z;_1 occurs both in the expression for X; and in that for X;_1, there is a 
non-zero correlation between X, and X;_,. For example, suppose that 0 < 0. In 
this case, if Z;_; happens to be large and positive, then both X; and X;_, will 

tend to be large and positive. So there is a positive correlation between X; and 

X;— 1. Applying the model again at time t — 2 gives 


Xt-2 = Zy-g — OLZy_3. 


The expression for X;~2 does not share any terms with the expression for X+. 
Since the Z; are uncorrelated, this means that the correlation between X; and 
X;+_9 is zero. Similarly, the correlation between X, and X;_, is zero for all k > 2. 
Thus the series X; exhibits non-zero dependence only at lag 1, and hence the 
dependence is short-term. 











This model for X; is the moving average model of order 1, also written Moving averages were 
MA(1). The MA(1) model is stationary whatever the value of 0. However, for introduced in Section 4 as a way 


technical reasons that will not be discussed here, the parameter 0 is restricted to of smoothing a time series. The 
the range —1 <0 <1 moving average model described 


here should not be confused 
with them. 





The MA(1) model may be generalized to time series with non-zero mean; the 
definition is given in the following box. 


Moving average model of order 1 


Let X; be a stationary time series with mean u. The moving average 
model of order 1, or MA(1) model, has the following form: 


X,;—-p=2Z, —92Z;-1, 


where @ is a parameter to be estimated, —1 < 0 < 1, and Z is white noise 


with mean 0 and variance g2. 
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As just explained, the autocorrelation function for the MA(1) model is zero at all 
lags greater than 1. In fact, it can be shown that the ACF is as follows: 
= =e 
(=v. Jor > 1. 


(13.1) 


On the other hand, the partial autocorrelation function for the MA(1) model can 
be shown to decline exponentially in magnitude with increasing lag. If 0 < 0, the 
successive partial autocorrelations alternate in sign. If 0 > 0, the partial 
autocorrelations are negative at lags k > 1. Figure 13.1 shows the theoretical 
autocorrelation function and the theoretical partial autocorrelation function for 
lags 1 to 10 for MA(1) models with 0 = 0.8 and 0 = —0.8. 
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Figure 13.1 The ACF and the PACF for two MA(1) models 


Note the duality between AR(1) and MA(1) models. In Section 12, you saw that 
for an AR(1) model, the ACF declines exponentially in magnitude and the PACF 
cuts off after lag 1. For an MA(1) model, the ACF cuts off after lag 1 and the 
PACF declines exponentially in magnitude. This duality can be exploited to 
choose a model for a time series, by examining the sample ACF and the sample 
PACF for the series. 


Example 13.1 Annual changes in average temperature, 1901—2004 


Most scientists are agreed that the climate of the Earth is changing. In the UK, 
annual average temperatures have increased over the last century. To analyse the 
rate of change of average temperatures, the annual change from year to year can 
be computed — that is, the average temperature in year t minus the average 
temperature in year t — 1. (This gives the time series of first differences of annual 
average temperatures. ) 

















The time series of annual temperature changes between 1901 and 2004 in Central 
England is shown in Figure 13.2. 
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Figure 13.2 Time plot of annual changes in average temperature, 1901-2004 


There is no clear trend or other systematic variation in the level or the size of the 
fluctuations, so the time series is stationary in mean and in variance. Thus there 
is no evidence to suggest that this time series is not stationary. 


The correlogram and the partial correlogram for the time series are shown in 








Figure 13.3. 
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Figure 13.3 Changes in average temperature: (a) correlogram (b) partial correlogram 


The correlogram in Figure 13.3(a) shows a clear negative autocorrelation at lag 1. 
Thereafter, the sample autocorrelations are smaller in magnitude and generally lie 
within the 95% significance bounds. There is one exception: the sample 
autocorrelation at lag 10 slightly exceeds the significance bound. Autocorrelations 
at higher lags are difficult to interpret, especially since one sample autocorrelation 
out of 20 could be expected to exceed the bounds by chance, even if the 
underlying autocorrelations were zero. Thus it is reasonable to ignore the 
autocorrelation at lag 10. 








The partial correlogram shows negative values for the first few lags. The sample 
partial autocorrelations tend to decline in magnitude, though this pattern is by no 
means regular — for example, the values at lag 2 and lag 3 are smaller in 
magnitude than the value at lag 4. 





It may therefore be concluded that the patterns in Figures 13.3(a) and 13.3(b) are 
broadly similar to those shown in Figures 13.1(a) and 13.1(b). Thus it is not 
unreasonable to interpret the patterns as suggestive of an MA(1) model, with the 
parameter 0 taking a positive value. ¢ 
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In Example 13.1, a model was identified by comparing the sample ACF and 
sample PACF to the theoretical ACF and PACF. In fact, this is the standard way 
to choose a model. In making such comparisons, it is important to focus on the 
general features of the plots, and avoid being too fastidious about small 
differences between the sample and theoretical values. Activity 13.1 will give you 
some practice at identifying a model. 





Activity 13.1 Variation in yield of Government securities 


In this activity you will investigate the time series of monthly changes in the yield 
of British Government securities; increases in yield are positive, decreases are 
negative. This time series comprises the first differences of the time series of 
monthly yields, which was described in Example 11.4. Figure 13.4 shows the time 
plot for the time series of monthly changes, and Figure 13.5 shows the 
correlogram and the partial correlogram. 
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Figure 13.4 Time plot of monthly changes in yield 
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Figure 13.5 Monthly changes in yield: (a) correlogram (b) partial correlogram 
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(a) Is the time series stationary? Explain your answer. (You may assume that 
the time series is not seasonal.) 


(b) It has been suggested that an MA(1) model is appropriate for this time 
series. Explain why this is a reasonable suggestion. 


(c) Would you expect the value of the parameter 0 to be positive or negative? 
Explain your answer. 


13.2 The moving average model of order q 





The moving average model of order 1 can be extended to include further terms. 
For example, the moving average model of order 2, or MA(2) model, for a 
stationary time series X, with mean pu, has the following form: 


Xi == Zt = 0124-4 = O2Z4_2, (13-2) 


where 0; and 62 are parameters to be estimated, and Z; is white noise with mean 
zero and variance g°. As for the MA(1) model, for technical reasons some 


conditions are required on the parameters 0; and @». You may assume that all the 
models presented in this book 
satisfy these conditions. 


Example 13.2 A simulated MA(2) time series 


A time series was generated using the MA(2) model with u = 16, 6; = 0.6, 

ə = 0.2 and o? = 2. That is, values of independent normal random variables 
Z, ~ N(0,2) were simulated and combined using the defining formula for the 
MA(2) model given in (13.2). The time plot for the first 100 values obtained is 
shown in Figure 13.6. 
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Figure 13.6 Time plot for 100 simulated values 


The size of the fluctuations appears to increase slightly. However, the values were 
simulated using a model with constant variance, so this effect is due to chance. 
This serves to emphasize the general point that small effects should not be 
over-interpreted. 
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The correlogram and the partial correlogram are shown in Figure 13.7. 
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Figure 13.7 Simulated data: (a) correlogram (b) partial correlogram 


The correlogram shows two (relatively) sizeable sample autocorrelations at lags 1 
and 2. The sample autocorrelations at lags 3 to 20 are smaller in magnitude, and 
are generally contained within the significance bounds. Just one autocorrelation, 
at lag 14, peeps over the boundary. In fact, the underlying autocorrelation at 

lag 14 is known to be zero. This confirms that sample autocorrelations at higher 
lags should not be over-interpreted, unless there is good reason to suspect an 
effect. 


In the partial correlogram, notice that the sample partial autocorrelations at the 
first few lags are negative, and the sample partial autocorrelations generally 
decline in magnitude after lag 2. 


In general, the partial autocorrelation function for an MA(2) model either 
declines exponentially in magnitude, or exhibits a damped sinusoidal pattern 
(that is, alternating clumps of positive and negative values that tail off to zero in 
magnitude as the lag increases). The pattern in the partial correlogram in 
Figure 13.7(b) roughly matches the theoretical PACF for an MA(2) model. 


The correlogram in Figure 13.7(a) displays sizeable sample autocorrelations only 
at lags 1 and 2. In fact, it can be shown that the theoretical autocorrelation 
function for an MA(2) model is non-zero at lag 2, and is zero at all lags k > 2 so 
the pattern in the correlogram is consistent with the theoretical ACF. @¢ 


In general, for an MA(2) model, 
140 + 05 
=l for“ > 2. 
The ACFs for four MA(2) models are shown in Figure 13.8. 


p2 (13.3) 
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Figure 13.8 The ACFs for four MA(2) models 


Notice that for each of these four MA(2) models, p is non-zero and p, = 0 for all 
k > 2. 





The moving average model of order 1 may be extended to arbitrary order q, where 
q is a positive integer. The moving average model of order q, or MA(q) 
model, for a time series X; with mean u, has the following form: 


Ke iS Ly 1 = E 


where 6), 02,...,0, are constants, and Z+ is white noise with zero mean and 
variance o°. The autocorrelation function for an MA(q) model is non-zero at 
lag q, and zero at all lags k > q: 


= —% 
o 1+6 4+--- +62’ 
p =0 fork >4q. 


bg (13.4) 


Notice that when q = 1, Formula (13.4) reduces to Formula (13.1), and when 
q = 2, it is equivalent to Formula (13.3). So Formula (13.4) can be used to find 
the autocorrelation at lag q for an MA(q) model for any order q > 1. 


The partial autocorrelation function tails off to zero in magnitude, either 
exponentially or in a damped sinusoidal pattern. 


In Subsection 13.1, the ACFs and PACFs for MA(1) and AR(1) models were 
contrasted. Now consider the ACFs and PACFs for MA(q) and AR(p) models for 
general p and q. The autocorrelation function for an MA(q) model is non-zero at 
lag q and zero at all lags k > q, whereas the partial autocorrelation function tails 
off to zero in magnitude, either exponentially or in a damped sinusoidal pattern. 
In contrast, for an AR(p) model, the PACF cuts off after lag p and the ACF tails 
off to zero in magnitude. This difference is useful for choosing between a moving 
average model and an autoregressive model for a time series. 
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The definition and key properties of moving average models of order q are 
summarized in the following box. 


Moving average models 


Let X; be a stationary time series with mean u. The moving average 
model of order q, or MA(q) model, where q is a positive integer, has the 
following form: 


eh S T Die S 


where O eO are parameters to be estimated, and Z; is white noise with 


mean 0 and variance o”. 


The ACF for an MA(q) model satisfies 
= —9q 

o LO eee 
D SS O Se. 


Pq 


The PACF for an MA(q) model tails off to zero in magnitude, either 
exponentially or in a damped sinusoidal pattern, as the lag increases. 


Activity 13.2 Moving average models 


Each of the expressions below represents a model for a stationary time series X; 
with mean zero. In each case, Z; is white noise. For each of the models, say 
whether or not it is a moving average model. If it is a moving average model, 
state the order q and calculate the autocorrelation at lag q. If it is not a moving 
average model, explain why not. 


(a) Xi = Zi —0.2%_-1 + 0.3.\/Z4_-9 
Di r= 7-=057.4.—0 27-5 
CG; 4=]02% 447-0174 
(d) X; = Z+ 0.92;1 


Suppose that data £1, £2,...,£n are available on a time series, and that the time 
plot suggests that the time series is stationary. If a moving average model is 
appropriate for the data, then the sample ACF and the sample PACF should 
roughly match the theoretical ACF and the theoretical PACF for the model. 
Matching the sample and the theoretical ACFs and PACF's in this way has been 
illustrated in Examples 13.1 and 13.2. 


If a moving average model is appropriate, then the sample ACF should be close to 
zero after some lag q which defines the order of the model, and the sample PACF 
should tail off to zero with increasing lag. Activity 13.3 will give you some practice 
at deciding when a moving average model is appropriate, and choosing its order. 
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Activity 13.3 Choosing a moving average model 


Figure 13.9 shows the time plot of the monthly percentage increase in seasonally 
adjusted electricity demand at a plant in California. Delurgio, S.A. (1988) 


Forecasting Principles and 
Applications. McGraw-Hill, 








Singapore. 
Percentage increase 
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Figure 13.9 Percentage increase in seasonally adjusted electricity demand 
The correlogram and partial correlogram for this time series are shown in 
Figure 13.10. 
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Figure 13.10 Percentage increase in seasonally adjusted electricity demand: (a) correlogram 
(b) partial correlogram 
(a) Is this time series stationary? Explain your answer. 


(b) It is suggested that this time series could be modelled using a moving average 
model. Identify two features of the time series that support this suggestion. 


(c) What order would you choose for the model, and why? 
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Summary of Section 13 


In this section, moving average models for stationary time series have been 
introduced. The properties of moving average models of order 1 and of arbitrary 
order q have been described. You have learned how to use the sample ACF and 
the sample PACF to decide whether a moving average model is appropriate and 
to choose its order. 





Exercise on Section 13 


Exercise 13.1 Identifying a moving average model 


Figures 13.11 and 13.12 show the correlogram and the partial correlogram for two 
stationary time series. 
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Figure 13.11 Time series 1: (a) correlogram (b) partial correlogram 
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Figure 13.12 Time series 2: (a) correlogram (b) partial correlogram 


For each of these two time series, decide whether or not a moving average model 
is appropriate. Explain your reasoning. If a moving average model is appropriate, 
choose the order of the model. 
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14 The ARIMA modelling framework 


In Sections 12 and 13, autoregressive models and moving average models were 

introduced. In this section, a more general modelling framework is described, 

known as ARIMA modelling. This incorporates both autoregressive models and ARIMA is pronounced 
moving average models, and also allows for differencing of time series. You have ‘ah-ree-mah’. 

already met most of the key ideas required: the main novelty of this section 

resides in how they all fit together. 


In Subsection 14.1, the ARIMA modelling framework is described. Then, in 
Subsection 14.2, you will learn how to choose an ARIMA model. Checking the 
adequacy of an ARIMA model is described in Subsection 14.3. 


14.1 ARIMA models 


As you may have suspected, the AR in ARIMA stands for autoregressive, and the 
MA for moving average. The I in the middle stands for Integrated (this is related 
to differencing). Thus an ARIMA model is an integrated autoregressive moving 
average model. 


Suppose that X; is a stationary time series with mean zero. The autoregressive 
model of order p, or AR(p) model, has the form 


Xe = B,X¢-1 +++ + By Xe_p + Ze; 
where Z; is white noise with mean 0 and variance o?. This may be rewritten as 
Xe — e — +++ — S = Zp. 


The term on the right-hand side is just white noise. A more general model is 
obtained by replacing this term by a moving average model of order q, or MA(q) 
model. Thus the model becomes 


Xt — GO, Xt-1 — sot 6,Xt—p = 44 — 0124-1 Se A 
This may be rewritten as follows: 

Xı = GO, X¢-1 apap 6,Xt—p F Zt — 0124-1 =e eZ T 
This is called the autoregressive moving average model of order (p,q), or 
ARMA (p,q) model. If 6, =--- = 6, =0, then the model reduces to an MA(q) 
model: the autoregressive term is said to be of order zero, and the model can be 
described as ARMA(0,q). Similarly, if 01 =--- = 0, = 0, then the moving average 


term is said to be of order zero, and the model can be described as ARMA (p, 0): 
this coincides with the AR(p) model. If both p = 0 and q = 0, then the model is 
Xı = Z. Thus the ARMA(O0,0) model is the white noise model. 


Example 14.1 Some ARMA models 


The model 
ern) cme es 

is an AR(1) model, so it is ARMA(1,0). Similarly, the model 
N= 7-077. 4 4017s 

is an MA(2) model, so it is ARMA(0, 2). Combining these two models gives 
Xi = 0.3 Xi-1 + Z — 0.7Z4-1 + 0.12;_2. 


This is an autoregressive moving average model of order (1,2), or ARMA(1, 2) 


model. ¢ 
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Activity 14.1 Classifying ARMA models 


Identify the orders p and q of each of the following models, and hence classify each 
model using the ARMA(p, q) notation. 


(a) Xi = 0.3 X:—1 = O.1X4_2 T Zt — 0.52; 4 
(b) Xi = Zt T 0.9241 
(c) Xi = —0.2X;4_4 + 0.3.X4_9 — 0.1X;_3 -j Zi = 0.12; 


For simplicity, it has been assumed that X, has mean zero. If X; has non-zero 
mean u, then the general ARMA(p, q) model is obtained by substituting (X; — u) 
for Xz, (X¢_-1 — u) for Xz_1, and so on. 


It has also been assumed that X; is stationary. In fact, many time series are not 
stationary, but can be differenced to obtain a stationary time series, as described 
in Section 11. The order of differencing, represented by the letter d, is the 
smallest number of times the series must be differenced to obtain a stationary 
time series. Once the time series has been differenced to produce stationarity, the 
ARMA (p, q) modelling framework can be used on the stationary time series. 





An integrated autoregressive moving average model of order (p, d, q), or 
ARIMA (p, d, q) model, is an ARMA(p, q) model applied to a time series after 
differencing of order d. The reason why the model is called ‘integrated’ is 
explained in Example 14.2. 


Example 14.2 Yield on British Government securities 


Figure 14.1(a) shows the time plot of the monthly time series X; of percentage 
yields on British Government securities. 








This time series was introduced 
in Example 9.1. 
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Figure 14.1 Percentage yield on British Government securities: (a) time plot (b) correlogram for first differences 


This time series is clearly non-stationary. In Example 11.4, you saw that 
stationarity is obtained by differencing the series once. Hence the order of 
differencing for this time series is 1; that is, d= 1. Let Y, denote the differenced 
series: 


Yı = X; — X1, (= erri 
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Note that 

Xə = (Xo — X) + X1 = Y + X, 

X3 = (X3 — X2) + (Xe - X) + X1 = ¥34+ Yo4+ X1. 
More generally, 


Xe = ¥e+ Yea tee + Yo + X1. 








Thus the original time series X; can be retrieved from the differenced series Y, by 
summing the Y;, and adding Xı to the sum. Similarly, if an ARMA model for Y; 
is available, then a model for X; may be obtained by summing the models for 

Yi, Yr-1,.--., Yo, and then adding X,. The addition of successive terms of a time 
series is called integration. Thus the model for X+, is obtained by integrating the 
ARMA model for Y}. This explains the letter I of ARIMA. 


Figure 14.1(b) shows the sample ACF for Y;, which you investigated in 

Activity 13.1. This suggests that an ARMA(0,1) model (that is, an MA(1) In Activity 13.1, you saw that 
model) is appropriate for Y;. This ARMA model for Y, can then be integrated an MA(1) model is appropriate 
(that is, summed) to produce a model for the original time series X;. The model for the time series of first 

for X; is thus an integrated autoregressive moving average model of order (0,1, 1), nee 


or ARIMA(0,1,1) model. 4 


In Example 14.2, a model for the original series X; was obtained by integrating 
(that is, summing) the differenced series. The same idea applies whatever the 
order of differencing, the only difference being that successive integrations are 
required when d > 2. The details are omitted. 








Activity 14.2 Classifying ARIMA models 


In Example 11.5, the time series of the quarterly UK index of production was 
discussed. You saw that it is necessary to difference the series twice to obtain a 
stationary time series. Each of the following models is suitable for the 
twice-differenced series. Write down each model for the original series in the form 


ARIMA(p, d, q). 
(a) An MA(1) model. 
(b) An AR(2) model. 


(c) The white noise model. 








14.2 Selecting an ARIMA model 


There are two main steps involved in selecting an ARIMA model. The first step is 

to obtain a stationary time series. Prior to differencing, the time series might It is assumed throughout that 
need to be transformed — for instance, by taking logarithms — to ensure that it the time series is not seasonal. 
can be represented by an additive model, and hence is stationary in variance. 

Then the order of differencing, d, should be selected to ensure that the time series 

is stationary in mean. ‘Transforming time series was introduced in Subsection 2.3, 

and differencing was described in Section 11. 





Once a stationary time series has been obtained, the second step is to select an 
appropriate ARMA(p, q) model to represent it. The ARMA models you have met 
so far have all been either purely autoregressive — that is, AR(p) or 
ARMA(p, 0) — or purely moving average — that is, MA(q) or ARMA(O, q). 
However, the whole point of introducing the ARMA(p, q) notation is to allow 
models in which both p and q are non-zero. Such models combine an 
autoregressive component and a moving average component, and are called 


mixed ARMA models. 
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Example 14.3 An ARMA(1,1) model 
A stationary time series X, with mean u = 5 was simulated using the following 
ARMA(1,1) model: 
Xt = 5 = 0.6(X¢_1 = 5) + Zt 4 0.9241, 


where Z; is white noise with mean zero and variance o? = 2. The time plot of this 
time series is shown in Figure 14.2. 


ay 
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Figure 14.2 Time plot of simulated ARMA(1, 1) time series 


Figure 14.3 shows the correlogram and the partial correlogram for this time series. 


Autocorrelation Partial autocorrclation 
I I 




















Q Q 
Zij —] 
I 2 3456 7 8 OM ee aay se 2 0) TR 3 eb he MIT ea ah iy Te Thess IS) 20 
Lag Lag 
(a) (b) 


Figure 14.3 Simulated ARMA(1,1) time series: (a) correlogram (b) partial correlogram 


The sample autocorrelations shown in Figure 14.3(a) tail off gradually to zero, as 
would be expected for an autoregressive model. The partial correlogram in 

Figure 14.3(b) shows an alternating pattern, the magnitude of the sample partial 
autocorrelations also tailing off to zero gradually with increasing lag. This is what 
would be expected of a moving average model. 4 
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The patterns in the correlogram and the partial correlogram in Figure 14.3 are 
typical of the sample ACF and sample PACF for an ARMA(1,1) time series. For 
an ARMA(1,1) model, both the ACF and the PACF gradually tail off to zero, in 
either an exponential or damped sinusoidal manner. In fact, this is the case for all 
ARMA(p, q) models with p > 0 and q > 0: neither the ACF nor the PACF is zero 
after some lag. This makes distinguishing between such models difficult. In 
practice, most commonly used models have p+ q < 2, so this is seldom a major 
problem. 


Table 14.1 summarizes the key features of ARMA models, and is helpful in 
selecting an appropriate model. 


Table 14.1 Notation and key features of ARMA models 
Model Notation ACF PACF 


White noise ARMA(0,0) Zero at alllags Zero at all lags kob Rccal nat pS amd 
Autoregressive ARMA(p,0) ‘Tails off to zero Zero after lag p op . 

Moving average ARMA(0,q) Zero after lag q Tails off to zero 

Mixed ARMA(p,q) Tails off to zero Tails off to zero 


This table refers only to lags 





Very often, more than one ARMA model may be appropriate for a given time 
series. A useful principle when selecting a model is to keep the value of p + q toa 
minimum. This is called the principle of parsimony. This principle is 
illustrated in Example 14.4. 


Example 14.4 The principle of parsimony 








Suppose that the model used to generate the time series described in 
Example 14.3 is not known. How might you go about selecting a model, based on 
the correlogram and the partial correlogram shown in Figure 14.3? 


Consulting Table 14.1, it is immediately clear that the white noise model is not 
appropriate, since several sample autocorrelations and sample partial 
autocorrelations exceed the significance bounds. However, other models are not 
ruled out by the properties of the ACF and PACF summarized in Table 14.1. A 
suitable justification for these other models might be as follows. 


Autoregressive: the sample ACF tails off to zero, and the sample PACF is close to 
zero after lag 4. So an appropriate model is AR(4). 


Moving average: the sample ACF is close to zero after lag 4, and the sample 
PACF tails off to zero. So an appropriate model is MA(A4). 


Mixed: both the sample ACF and sample PACF tail off to zero, so an appropriate 
model is ARMA(p,q) for p and q greater than or equal to 1. 





Several models are therefore consistent with the properties given in ‘Table 14.1. 
The next step is to work out how many parameters each of these various models 
has. The autoregressive part of the model has p parameters, 6,,...,6,. The 
moving average part of the model has q parameters, #),...,0,. According to the 
principle of parsimony, the model with the smallest total number of parameters 
p + q should be selected. 


Autoregressive AR(4): p+q=4+0=4. 
Moving average MA(4): p+q=04+4=4. 
Mized ARMA(1,1): p+q = 2. More generally, ARMA (p, q): p + q parameters. 


The model with the smallest value of p + q among the plausible models is thus the 
mixed ARMA(1, 1) model. So this is the best candidate. ¢ẹ 


The principle of parsimony is not guaranteed to identify the best model, or even a 
good model. Rather, it is a useful ‘rule of thumb’ to help select an initial model, 
which you can then try out, and improve upon if it is found wanting. Invoking the 
principle of parsimony is only necessary if there are several likely candidate 
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models. This may not always be the case: sometimes the classification in 
Table 14.1 determines a clear ‘best’ candidate. Activities 14.3 and 14.4 will give 
you some practice at choosing an appropriate ARIMA model. 


Activity 14.3 ARIMA model for chemical concentrations 


In Activity 4.2, the time series of concentration levels of a chemical process was 
introduced. The time plot for the concentrations, and the time plot for the first 
differences in concentrations, are shown in Figure 14.4. 


Concentration First difference 
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Figure 14.4 Chemical concentrations: (a) raw data (b) first differences 


Figure 14.5 shows the correlogram and the partial correlogram for the first 
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Figure 14.5 First differences of concentrations: (a) correlogram (b) partial correlogram 
(a) Explain why d, the order of differencing required to obtain a stationary time 
series, is 1. 


(b) Use Table 14.1 to identify a plausible model, and state why the other options 
are less plausible. 


(c) Write down the model you chose in ARIMA(p, d, q) notation. 
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Activity 14.4 ARIMA models for the viscosity data 


A time series of viscosity measurements was discussed in Examples 12.3 and 12.6. 
You saw that the time plot shown in Figure 12.5(a) suggests that the series is 
stationary. The correlogram for the time series is shown in Figure 12.5(b), and 
the partial correlogram in Figure 12.9. 


(a) Use the correlogram and the partial correlogram to identify some plausible 
models for this time series. Explain your reasons. 


(b) Use the principle of parsimony to identify one or two ‘best’ candidate models. 


(c) List these models using the ARIMA (p, d, q) notation. 


14.3 Fitting and checking the model 


Once you have selected an ARIMA model, the next step is to estimate the 
parameters of the model, that is, to fit the model to the data. To fit the general 
ARIMA(p, d,q) model, the following parameters must be estimated. 


© The mean of the series (after differencing, if applicable): p. 

© The p autoregressive parameters: (,,...,(,. 

© The q moving average parameters: 6),..., 0g. 

© The standard deviation of Z: o. 

There are many ways of estimating the parameters of an ARIMA model. They 


will not be described in this book. The basic idea of these methods is to choose You will learn how to fit 
the parameter values in such a way that the 1-step ahead forecasts 7; obtained ARIMA models using SPSS in 
using the fitted model are close to the observed values x+, t = 1,2,...,n. One Section 15. 


measure of closeness is the SSE, the sum of the squared forecast errors: 


n 


n 
SSE = ` e? = Ss (te — w) ' l-step ahead forecasts, forecast 
I=] t=] errors and the SSE were 


l . . introduced in Section 6. 
The calculation of the 1-step ahead forecasts and the forecast errors is described OEE eater 


in Example 14.5. 


Example 14.5  1-step ahead forecasts and forecast errors for ARIMA 
models 


Suppose that observations 71, %2,..., £n are collected on a stationary time 
series X;, and that 1-step ahead forecasts are required based on the 
ARIMA(1, 0,1) model 


Xi = 0.6.X4_4 + Zi + 0.9241. 


First, choose the starting value: set 7, = 0. Then the 1-step ahead forecast error 
at time t = 1 is 


Bj = 21 —21 = 21. 


For t = 2, the model formula is Xə = 0.6X1 + Z2 + 0.9Z,. To obtain Zo, replace 
Xı by the observed value z1, replace Z by its expected value, which is zero, and 
use e; to estimate Z1. This gives 


to = 0.67, + 0 + 0.9e4. 


At the next time point, t = 3, the model formula is X3 = 0.6 Xə + Z3 + 0.922. To 
obtain %3, replace X2 by x2, replace Z3 by its expected value (zero), and use 
e> = Lo — Tə to estimate Zə. Thus 


£3 = 0.672 + 04+ 0.96e2. 


The process is repeated in this way until the end of the series. A similar method 
is used for all ARIMA models. @ 
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An important point to note from Example 14.5 is that the l-step ahead forecast 
error €+ is an estimate of Z4. This may be used to check that the model is 
adequate, as follows. Since Z; is white noise, if the model is adequate, then the 
time series of 1-step ahead forecast errors e, (which estimate the Z+) should be 
similar to white noise. In particular, the distribution of the forecast errors should 
be approximately normal with mean zero and constant variance, and the forecast 
errors should have zero autocorrelation. 


The adequacy of the model may thus be checked in exactly the same way as 
described for exponential smoothing in Subsections 9.2 and 9.3. The steps 
involved are as follows. 


© First, check that the distribution of the forecast errors is approximately 
normal with mean zero and constant variance. You can do this by examining 
a time plot and a histogram of the forecast errors. 


© Then check that the autocorrelations of the forecast errors are zero at lags 
k > 1. This can be done by, for example, examining the correlogram for the 
forecast errors, and applying the Ljung—Box test for zero autocorrelation. 


If the forecast errors are not white noise, then the model is not adequate. In this 
case, you will have to try a different model. If several plausible models have been 
identified, with the same value of p+ q, then the SSE can be used to compare 
how well they fit the data: the model with the smallest SSE fits the data best. 
The method is illustrated in Example 14.6. 


Example 14.6 Comparing and checking ARIMA models 


In Activity 14.4, two plausible ARIMA models were identified for the viscosity 


data: an ARIMA(2,0,0) model and an ARIMA(1,0,1) model. These two models 
were fitted using a standard method. 


For the fitted ARIMA(2, 0,0) model, the SSE was 1069.16, whereas for the 
ARIMA(1,0,1) model, the SSE was 1114.91. Thus the ARIMA(2, 0,0) model fits 
the data better than the ARIMA(1, 0,1) model. 


The time plot of the original data and the 1-step ahead forecasts obtained using 
the ARIMA(2, 0,0) model are shown in Figure 14.6. 


Viscosity 


40 — Observed 


—--— Forecasted 





0 50 100 
Time 


Figure 14.6 Viscosity data: observed values and 1-step ahead forecasts 
To verify that the model is adequate, we must first check that the forecast errors 


are approximately normally distributed about zero with constant variance. The 
time plot and a histogram of the forecast errors are shown in Figure 14.7. 
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Forecast error Frequency 
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Figure 14.7 Forecast errors: (a) time plot (b) histogram 


From the time plot in Figure 14.7(a), the forecast errors fluctuate around zero, 
and the fluctuations do not vary in magnitude. Hence it is reasonable to assume 
that they have mean zero and constant variance. From the histogram in 

Figure 14.7(b), it appears that the normality assumption is just about tenable. 
However, there is a suggestion that the forecast errors might be negatively 
skewed, because of the long left tail on the histogram. Nevertheless, the normality 
assumption is not unreasonable. 





The next step is to check that the autocorrelations are zero. Figure 14.8 shows 
the correlogram for the forecast errors. 
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Figure 14.8 Correlogram for the forecast errors 


The correlogram does not suggest that there are any non-zero autocorrelations at 
lags 1 to 20. This is confirmed by the Ljung—Box test for zero autocorrelation at 
lags 1 to 20. The value of the test statistic is 17.946, and the p value is 0.59. 
Hence there is little evidence against the null hypothesis that all autocorrelations 
at lags 1 to 20 are zero. 








In conclusion, it appears that the ARIMA(2,0,0) model has adequately accounted 
for the correlation structure within this time series. @ 
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Activity 14.5 will give you some practice at evaluating the adequacy of an 


ARIMA model. 


Activity 14.5 An ARIMA model for the British Government 
securities data 


The time series of monthly yields on British Government securities was 
introduced in Example 9.1. In Example 11.4, you saw that the series of first 
differences appears to be stationary; and in Activity 13.1, an MA(1) model was 
suggested as a reasonable model for the first differences. This suggests that the 
original data might be modelled using an ARIMA(0,1,1) model. Figure 14.9 
shows the time plot of observed and fitted values (that is, the 1-step ahead 
forecasts), and the time plot, a histogram and the correlogram for the forecast 
errors obtained using an ARIMA(0, 1,1) model. 
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Figure 14.9 ARIMA model for securities data: (a) observed values and 1-step ahead forecasts (b) time plot of 
forecast errors (c) histogram of forecast errors (d) correlogram for forecast errors 


The value of the Ljung—Box test statistic for lags 1 to 20 is 24.16, and the p value 
is 0.24. Discuss whether an ARIMA(0, 1,1) model is adequate for these data. 
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In Section 9, Holt’s exponential smoothing method was used to obtain 1-step 
ahead forecasts for the time series of yields on British Government securities 
discussed in Activity 14.5. In Activity 9.4, you examined the correlogram for the 
forecast errors: the correlogram shows that the forecast errors are correlated (as 
the autocorrelation at lag 1 is clearly non-zero). Thus Holt’s exponential 
smoothing method is not optimal for the time series, as the forecast errors 
obtained are not uncorrelated. On the other hand, the ARIMA(0, 1,1) model 
accounts for the underlying correlation structure of the time series. In this sense, 
the ARIMA model is an improvement over exponential smoothing for this time 
series. 


The steps involved in selecting and checking the adequacy of an ARIMA model 
are summarized in the following box. 


Selecting and checking an ARIMA model 


The steps involved in selecting an ARIMA model for a non-seasonal time 
series are as follows. 


© Check that an additive model is appropriate for the time series. If it is 
not appropriate, transform the time series to obtain a series that can be 
represented by an additive model. 





© Identify the order of differencing, d, required to obtain stationarity. 


© Identify those ARIMA(p,d,q) models that are consistent with the 
correlogram and the partial correlogram for the stationary series. 


© Choose the model(s) with the lowest value of p + q. 
After fitting an ARIMA model, its adequacy should be checked, as follows. 


© Check the fit of the model by examining a multiple time plot of the 
time series and the 1-step ahead forecasts. 


© Verify that the distribution of the forecast errors is approximately 
normal with mean zero and constant variance. 


© Use the correlogram for the forecast errors and the Ljung—Box test to 
check that the forecast errors are uncorrelated. 


More general ARIMA models are available for seasonal time series. Although they 


do not involve any essentially new ideas, they are more complicated and are not 
covered in this book. 


Summary of Section 14 


In this section, integrated autoregressive moving average models have been 
described, and the ARIMA(p,d,q) notation has been introduced. You have 
learned how to classify ARIMA models, and how to select ARIMA models for a 
time series, making use of the principle of parsimony. You have learned how to 
check the adequacy of an ARIMA model by inspecting the 1-step ahead forecasts 
and the forecast errors from the model. 
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Exercises on Section 14 


Exercise 14.1 Classifying ARIMA models 


The descriptions that follow relate to ARIMA models for a time series X+. 
Identify the values of p, d and q from these descriptions, and hence describe them 
using ARIMA(p, d, q) notation. 


(a) After twice differencing the series X;, an autoregressive model of order 2 was 


fitted. 
(b) X; is a stationary time series with mean zero, and 
E 5 eee ee ee ee 
where Z; is white noise. 
(c) An ARMA(1,1) model was fitted to the series of first differences of X¢. 
(d) Y, is stationary, and 
Yi — u = Z + 0.421, 


where Y; = X; — Xz_1 and Z; is white noise. 


Exercise 14.2 Selecting an ARIMA model 


The correlogram and the partial correlogram for a stationary time series are 
shown in Figure 14.10. 
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Figure 14.10 A stationary time series: (a) correlogram (b) partial correlogram 
(a) It is proposed to model this time series using an autoregressive model. Is this 
reasonable? Choose an appropriate value for p. 


(b) Now suppose it is suggested that a moving average model is used. Is this 
reasonable? Choose an appropriate value for q. 


(c) Finally, it is suggested that an ARMA(1,1) model is used. Do you think this 
is reasonable? Explain your answer. 


(d) From the models suggested in parts (a), (b) and (c), select a shortlist of 
models using the principle of parsimony. 
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Exercise 14.3 Checking model adequacy 


An MA(1) model is fitted to the data used in Exercise 14.2. Figure 14.11 shows a 
histogram and the correlogram for the forecast errors. 
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Figure 14.11 Forecast errors from an MA(1) model: (a) histogram (b) correlogram 


The p value for the Ljung—Box test at lags 1 to 20 is 0.01. Is an MA(1) model 
adequate for this time series? Explain your reasoning. 


15 ARIMA modelling in SPSS 


In this section, you will learn how to use SPSS to select, fit and check the 
adequacy of ARIMA models. You will also learn how to obtain forecasts and 
prediction intervals for these forecasts, based on a suitable ARIMA model. 


Refer to Chapter 8 of Computer Book 2 for the work in this section. 





Summary of Section 15 


In this section, you have learned how to use SPSS to obtain the correlogram and 
partial correlogram for a time series, if necessary after differencing. Fitting 
ARIMA models in SPSS has been described, along with checking model adequacy. 
You have learned how to obtain forecasts one or more steps ahead, and prediction 
limits for these forecasts. 
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Exercise 16.1 Sunspots data 


Sunspots appear on the surface of the Sun as dark blotches. The number and size 

of sunspots, and the pattern of their appearance, has fascinated scientists and 

amateur astronomers for centuries. A famous data set tracks monthly sunspot 

activity, as measured by the ‘sunspot number’, since January 1749. Figure 16.1 

shows the time plot of the monthly sunspot numbers between January 1934 and The complete data set may be 


December 1983, a span of 50 years. obtained from the Datasets 
Archive of the StatLib Index at 


http: //lib.stat.cmu.edu/datasets / 


Andrews. 
Sunspot number 
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Figure 16.1 Sunspot number, 1934-1983 
(a) Describe the main features of this time series, and obtain a rough estimate of 
the period T of the cycle. 
(b) Is an additive model with a cyclical component likely to be appropriate for An additive model with a 
these data? Explain your reasoning. cyclical component is similar to 


an additive model with annual 
The time plots for two transformations of the time series of sunspot numbers are seasonality, except that the 


shown in Figure 16.2. cycle has period T rather than 
period one year. 
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Figure 16.2 Transformed sunspot numbers: (a) logarithms (b) square roots 


(c) Which transformation yields a time series that may be described adequately 
by an additive model? Explain your answer. 
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Exercise 16.2 Maximum temperatures in Alaska 


Figure 16.3 shows the time plot of the maximum temperature (in degrees 
Fahrenheit) recorded each month in Anchorage, Alaska, between January 1954 


and December 2004. 
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Figure 16.3 Maximum monthly temperatures (°F) in Anchorage, Alaska 


(a) Describe the main features of this time plot. 


(b) A decomposition of the time series yields the estimated seasonal factors 


shown in Table 16.1. 


Table 16.1 Estimated seasonal factors 


Month j 7 

January —21.21 
February —16.79 
March —10.05 
April 0.55 
May 11.91 
June 19.35 
July 22.26 
August 20.24 
September 1213 
October —2.61 
November —15.16 


December —20.64 


Explain briefly how these estimates are obtained. Interpret the seasonal 


factors. 


These data were obtained in 
July 2005 from the website 
http: //climate.gi.alaska.edu. 
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One of the plots in Figure 16.4 is the time plot of the seasonally adjusted 
maximum temperatures. Simple moving averages of orders 11, 51 and 121 were 
used to obtain three estimates of the trend component. The time plots of these 


trend estimates are also shown in Figure 16.4. 
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Figure 16.4 Seasonally adjusted temperatures: data and three moving averages 


(c) Which of the plots is the time plot of the seasonally adjusted series? Which 
moving average was used to produce each of the other three time plots? Give 


a reason for your answer. 


(d) In your view, which of the three moving averages produces the best estimate 


of the trend? Explain your choice. 


(e) Use the time plot for the moving average you chose in part (d) to describe 


the underlying trend in maximum temperatures. 
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Exercise 16.3 Forecasting the Dow Jones index 


The Dow Jones industrial average is an index based on share prices of leading 
companies. It is used to chart movements on the New York Stock Exchange. 
Figure 16.5 shows the time plot of the logarithms of the closing values of the Dow 
Jones index on the last day of each month between January 1988 and June 2005. 


Figure 16.5 Time plot of the logarithms of the Dow Jones index 


(a) Describe the main features of the time plot. Is an additive model appropriate 


(b) 


Section 16 Exercises on Book 2 


log(Dow Jones index) 


9.5 


1988 


for these data? 


It is required to forecast the July 2005 value using an exponential smoothing 
method, applied to data from January 2000. Identify an appropriate method, 
and explain your choice. (The seasonal variation in this time series is small 


Jan Jan 
1992 1996 
Mouth 


and may be ignored.) 


The first few values from January 2000 are shown in Table 16.2. 


Table 16.2 Dow Jones index, 
January 2000—April 2000 


Date 


January 2000 
February 2000 
March 2000 
April 2000 


The simple exponential smoothing method is to be applied to the logarithms 
of the Dow Jones index from January 2000. Choose an appropriate starting 


value. 


Table 16.3 shows the SSE for several values of the smoothing parameter a@ 


Dow Jones index 


10 940.53 
10 128.31 
10 921,92 
10 733.91 


Jan 
2000 


for the simple exponential smoothing method. 


Identify the optimal value of the parameter a. Interpret this value in terms of 
the relative weight given to observations in the recent past and the more 


distant past. 


The observed value of the logarithm of the Dow Jones index for June 2005 
was 9.2375, and the 1-step ahead forecast for June 2005 is 9.2537. Use these 
values to obtain a l-step ahead forecast for the July 2005 value of the Dow 


Jones index. 


These data were obtained in 


July 2005 from 


http: //uk.finance.yahoo.com. 


Table 16.3 Smoothing 


parameter a and SSE 


SSE 


1.2421 
0.3921 
0.2545 
0.1982 
0.1694 
0.1527 
0.1426 
0.1366 
0.1335 
0.1328 
0.1342 
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Exercise 16.4 Prediction limits for the Dow Jones index 


Holt’s exponential smoothing method was applied to the time series of logarithms 
of the Dow Jones index for the period January 1988 to June 2005. The Dow Jones index was 


S i described in Exercise 16.3. 
Figure 16.6 shows the time plot and a histogram of the 1-step ahead forecast 





errors. 
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Figure 16.6 Forecast errors for Holt’s method: (a) time plot (b) histogram 


(a) Discuss whether it is reasonable to assume that the forecast errors are 
normally distributed with mean zero and constant variance. 


Figure 16.7 shows the correlogram for the forecast errors, at lags 1 to 20. 
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Figure 16.7 Correlogram for the forecast errors 


(b) The length of the time series is 210. Calculate the 95% significance bounds, 
and interpret the correlogram. Is it reasonable to conclude that the forecast 
errors are white noise? Explain your answer. 


(c) The SSE is 0.3925, and the 1-step ahead forecast of the logarithm of the 
Dow Jones index for July 2005 is 9.2407. Obtain the July 2005 forecast for 
the Dow Jones index, and a 95% prediction interval. 
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Exercise 16.5 A model for the Dow Jones index 


A time plot of the first differences of the Dow Jones index for the period 
January 1988 to June 2005 is shown in Figure 16.8. 


The Dow Jones index was 
described in Exercise 16.3. 
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Figure 16.8 Time plot of first differences of the Dow Jones index 


(a) You may assume that the Dow Jones index is not seasonal. Is the time series 


in Figure 16.8 stationary? Explain your answer. 


Time plots of the first differences and the second differences of the logarithm of 


the Dow Jones index are shown in Figure 16.9. 


First difference 


Second difference 


0.1 
().2 
0 
0 
=). 1) 
=a — 0.2 
Feb Feb Feb March March March 
1988 1996 2004 1988 1996 2004 
Mouth Mouth 


(a) 


(b) 


Figure 16.9 Logarithms of the Dow Jones index: (a) first differences (b) second differences 


(b) Using these plots, and the time plot shown in Figure 16.5, identify the order 
of differencing d of the logarithms of the Dow Jones index required to 


produce a stationary time series. 
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Figure 16.10 shows the correlogram and the partial correlogram for the first 
differences of the logarithms of the Dow Jones index. 
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Figure 16.10 First differences of log Dow Jones index: (a) correlogram (b) partial correlogram 


(c) Interpret the correlogram and the partial correlogram. 


(d) Identify a plausible model for the logarithms of the Dow Jones index, and 
express this model in ARIMA (p, d, q) notation. 


Exercise 16.6 Models for the maximum monthly temperatures 


The time series of maximum monthly temperatures (in °F) recorded in 
Anchorage, Alaska, between January 1954 and December 2004 was described in 
Exercise 16.2. This exercise is based on the seasonally adjusted time series, which 
is shown in Figure 16.4(a). 


The correlogram and partial correlogram for the seasonally adjusted data are 
shown in Figure 16.11. 
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(a) (b) 


Figure 16.11 Seasonally adjusted maximum temperatures: (a) correlogram (b) partial correlogram 


(a) Suggest one or more plausible ARIMA models, justifying your choice in each 
case. 
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Section 16 Exercises on Book 2 


In Exercise 16.2, it was suggested that the level of the series may have increased 
slightly over the period. Accordingly, the series of first differences of the 
seasonally adjusted data were obtained. The correlogram and the partial 
correlogram for the first differences are shown in Figure 16.12. 
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Figure 16.12 First differences of seasonally adjusted maximum temperatures: 
(a) correlogram (b) partial correlogram 


(b) It is proposed to fit an ARIMA (0,1,1) model. Explain why this is a 
reasonable suggestion. 


(c) The estimated parameters of the ARIMA (0,1,1) model are t = 0.007, 
0 = 0.865 and o = 4.042. Write down the model formula for the first 
differences. 


(d) Two models are fitted to the seasonally adjusted data: an ARIMA(2, 0, 0) 
model and an ARIMA(0, 1,1) model. Table 16.4 shows the p values obtained 
for the Ljung—Box test of zero autocorrelation of the forecast errors at lags 1 
to 20. 


Briefly discuss the relative merits of the two models. What other information 
might you require? 


Table 16.4 p values for 
Ljung—Box test 


Model p value 


ARIMA(2,0,0) 0.541 
1,1 


2 
ARIMA(0,1,1) 0.003 
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Summary of Book 2 


Part I 


Time series are observations made at equally-spaced times. A time plot may be 
used to identify the main features of a time series; these may include the trend, 
seasonal variation and irregular fluctuations. Decomposition models are used to 
represent time series in terms of these components. Additive decomposition 
models, in which the components are added, play a central role in time series 
analysis. Visual inspection of the time plot for a series can help determine 
whether an additive decomposition model is appropriate, or whether the time 
series may need to be transformed. The trend component of a non-seasonal time 
series may be estimated by smoothing the time series using a moving average. 
The order of the moving average is chosen so as to avoid either under-smoothing 
the series or over-smoothing it. For a seasonal time series, a weighted moving 
average can be used to estimate the seasonal component of the time series, and 
hence obtain a seasonally adjusted series. 


Part Il 


Forecasting is a central preoccupation of time series analysis. A simple forecasting 
method for non-seasonal time series with constant level is simple exponential 
smoothing. The forecasts depend on a single smoothing parameter; the value of 
this parameter may be chosen so as to minimize the sum of squared errors. More 
elaborate versions of exponential smoothing, namely Holt’s method for time series 
with a linear trend, and the Holt—Winters method for seasonal time series, are 
also available. However, all forecasting methods rest upon the untestable 
assumption that the past is a good guide to the future. Assuming this to be the 
case, the accuracy of the forecasts can be quantified using prediction limits. Their 
calculation rests upon the assumption that the forecast errors are white noise. 
This may be checked by examining the distribution of the forecast errors and 
their correlogram, and by testing for zero autocorrelation using the Ljung—Box 
portmanteau test. 


Part III 


Models that account for the correlation structure of a time series may help to 
improve upon exponential smoothing methods. These models are defined for 
stationary series. Stationarity may be obtained by differencing the series 
appropriately. Autoregressive models allow for long-term dependence between the 
successive terms of a time series. In contrast, moving average models allow for 
short-term dependence. Both types of models are characterized by their order. 
The order of an autoregressive model may be identified from its partial 
autocorrelation function, while the order of a moving average model may be 
identified from its autocorrelation function. Autoregressive and moving average 
models for a differenced non-seasonal time series may be combined, resulting in an 
ARIMA(p, d,q) model. An appropriate ARIMA model for a given time series may 
be selected by examining the correlogram and partial correlogram for the series, 
and applying the principle of parsimony to select the simplest plausible model. 
The adequacy of this model may then be investigated by checking that the 
forecast errors are white noise, as for exponential smoothing. ARIMA models can 
readily be used to obtain forecasts and prediction intervals. 
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Summary of Book 2 


Learning outcomes 


You have been working to develop the following skills. 


Part I 


© Represent time series data using time plots and seasonal plots. 
© Interpret time plots and seasonal plots. 


© Describe the trend component, the seasonal component and the irregular 
component of a time series. 


© Decide whether an additive decomposition model is appropriate for a time 
series, after transformation if necessary. 


© Estimate the trend component of a non-seasonal time series using a moving 
average. 


© Choose the order of a moving average for trend estimation. 


© Describe the estimation of the seasonal, trend and irregular components of a 
seasonal time series that can be described by an additive decomposition 
model. 


© Describe the calculation of a seasonally adjusted series. 
© Interpret estimated seasonal factors and trends. 


© Use SPSS to enter and plot time series data and to undertake decompositions. 


Part Il 


© Obtain 1-step ahead forecasts using simple exponential smoothing. 


© Decide when it is appropriate to use simple, Holt’s or Holt—Winters 
exponential smoothing. 


© 


Interpret the results obtained using exponential smoothing methods. 


© 


Choose values for the smoothing parameters for an exponential smoothing 
method. 


Assess the accuracy of forecasts using the sum of squared errors. 

Be aware of the assumptions underlying forecasting methods. 

Represent and interpret sample autocorrelations using the correlogram. 
Calculate and interpret significance bounds for sample autocorrelations. 
Test the null hypothesis of zero autocorrelation using the Ljung—Box test. 


Calculate approximate prediction intervals for 1-step ahead forecasts. 


0 DO OLD Oo Oo 9 


Check the white noise assumption required for calculating prediction 
intervals. 


© Use SPSS to apply exponential smoothing methods. 


Part III 


© Identify whether a time series is stationary in mean and in variance. 
Obtain a stationary time series from a non-stationary one by differencing. 


Interpret the correlogram and the partial correlogram for a time series. 


> O © 


Use the correlogram and partial correlogram for a time series to select 
possible ARIMA models. 


Describe the structure of an ARIMA (p, d, q) model. 

Classify an ARIMA model given its model formula. 

Choose an ARIMA model using the principle of parsimony. 

Check the adequacy of an ARIMA model. 

Use SPSS to select, fit and check the adequacy of an ARIMA model. 
Use SPSS to obtain forecasts from an ARIMA model. 


Oo OO O og 
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Solutions to Activities 


Solution 1.1 


(a) In general, the number of visits overseas seems to 
be increasing from year to year. 


(b) There seems to be a peak each year between June 
and September, and a trough between December and 
February. 


Solution 1.2 


(a) The increasing trend appears to steepen over 
time, so it is not linear. Perhaps it is roughly 
quadratic or exponential. 


(b) The size of the seasonal fluctuations appears to 
increase over time. 


Solution 1.3 


There is a marked cyclic pattern of successive highs 
and lows in these data, which is probably attributable 
to seasonal variation. There may also be a downward 
trend, though this is harder to detect owing to the 
large fluctuations in the data. 


Solution 1.4 


(a) The time plot includes three complete cycles. 
Each cycle includes two peaks: a high peak (in which 
the blood pressure reaches about 120) and a lower 
peak. 


(b) The high peaks occur at about 530, 1150 and 
1790 milliseconds, so the period is between 600 and 
650 milliseconds. 


Solution 2.1 


(a) The seasonal factors should repeat at intervals 
of 4, but sı = —2 and s5 = 8444 = —3. So this 
sequence does not represent the seasonal component. 





(b) For this sequence, s; = 5444 and 
S1 + S2 + s3 + s4 = 0, so it represents the seasonal 
component. 


(c) For this sequence, 
Sı + S2 + S3 + s4 = —4 + 3 + 3 — 1 = 1 Æ 0, so it does 
not represent the seasonal component. 


Solution 2.2 


For the time series of overseas visits, the size of the 
seasonal fluctuations increases as the level of the time 
series increases. Thus an additive model is not 
appropriate for this time series. 





Since the time series of annual average temperatures is 
non-seasonal, there are no seasonal fluctuations. The 
level of the time series increases between 1951 and 
2004, but the irregular fluctuations do not appear to 
vary in size with the level. Thus an additive model 
may be appropriate for this time series. 
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Solution 2.3 


(a) The seasonal fluctuations for the log transformed 
series decrease in size as the level increases. Hence an 
additive model is not appropriate for the log 
transformed series. 





(b) From part (a), it follows that a multiplicative 
model is not appropriate for the time series of monthly 
visits overseas. In Activity 2.2, you found that an 
additive model is not appropriate for the time series. 
Hence neither an additive model nor a multiplicative 
model is appropriate for the time series. 





Solution 2.4 


(a) The size of the seasonal fluctuations is roughly 
the same, irrespective of the underlying level of the 
time series: the seasonal fluctuations are about the 
same size at the beginning of the series as they are at 
the end of the series, when the level is lower. On the 
other hand, the fluctuations of the irregular 
component may be greater at the end of the series 
than at the beginning. 


(b) The square root transformation also reduces the 
variability in the size of the seasonal fluctuations, 
though perhaps not quite as successfully as the log 
transformation. On the other hand, the irregular 
fluctuations do not appear to vary in size with the 
level of the series. 


(c) Perhaps the time series produced using the square 
root transformation might be more appropriately 
modelled by an additive model than that produced 
using the log transformation. However, the log 
transformation is more readily interpretable, so in 
practice, it might be best to try both and compare 
results. 


Solution 4.1 


(a) For a moving average of order 3, q = 1, so the 
moving average value for 1660 is 


Y1660 = t (£1659 + £1660 + ©1661) 
= $(8.83 + 9.08 + 9.75) 
= 0.22, 


Similarly, the moving average value for 1661 is 9.443, 
and that for 1662 is 9.277. 


(b) To calculate the moving average value of order 3 
for 1659 you would need the average temperature in 
1658. However, the series given in Table 4.1 starts in 
1659, so this value cannot be calculated. Similarly, to 
calculate the moving average value for 1663, you 
would need the average temperature for 1664, but this 
is not given in Table 4.1. 








(c) Using the data from Table 4.1, a moving average 
of order 5 can be calculated only for 1661. 


Solutions to Activities 


Solution 4.2 


(a) The higher the order, the greater is the degree of 
smoothing. Hence Figure 4.5(a) was produced using a 
moving average of order 3. Figures 4.5(b), 4.5(c) 

and 4.5(d) were produced using moving averages of 
orders 11, 25 and 51, respectively. 


(b) Figure 4.5(a) seems under-smoothed: much of the 
irregular variation in the original series is still present. 
Figure 4.5(d) seems over-smoothed: much of the detail 
has been lost and the smoothed series appears too flat. 
Figures 4.5(b) and 4.5(c) seem reasonable: which 
smooths by the right amount is to some extent a 
matter of opinion. Perhaps Figure 4.5(c) provides the 
better compromise between smoothing out the noise 
and smoothing out the trend. The best course of 
action would be to discuss with a chemist whether the 
peaks and troughs that appear in Figure 4.5(b) but 
not in Figure 4.5(c) might be important. 


Solution 4.3 


(a) First, the sum of the weights is 0.9, whereas for a 
weighted moving average the weights add up to 1. 
Secondly, the expression contains the term x7, whereas 
only linear terms (such as x+) are allowed. 





(b) An appropriate weighted moving average of 
order 7 is 
SA(t) = t (0.524—3 + £i—2 + Xt—1 + Tt 
T Tt+1 ae Lt+2 a 0.52443). 


Solution 4.4 


(a) The trend appears to be downward between 1991 
and 1993. After that, the trend is roughly level, before 
dipping between 1998 and 2000, then increasing again 
after 2001. 





(b) The smoothed values, which are shown in 
Figure 4.9(b), are subtracted from the values in the 
original series (shown in Figure 4.9(a)) to produce a 
new series y;. For each quarter, the raw seasonal 
factor is the average of the y+ for that quarter. 








(c) The average of the raw seasonal factors is 

F = +(—2949.41 + 500.18 + 647.19 + 1760.50) 

w= Ig; 

Thus the estimated seasonal factors are as follows: 

Sı = —2949.41 + 10.39 = —2939.02, 

S2 = 500.18 + 10.39 = 510.57, 

S3 = 647.19 + 10.39 = 657.58, 

$4 = 1760.50 + 10.39 = 1770.89. 
(d) Beer consumption is highest in the fourth quarter 
(October-December), possibly because of high 


consumption leading up to and over Christmas. It is 
lowest in the first quarter (January—March). 





Solution 4.5 


(a) The seasonally adjusted time series appears to 
show a downward trend over the period. However, 
there is considerable noise, so it is difficult to visualize 
the trend. 





(b) The trend estimate obtained with the moving 
average of order 3 is very jagged, suggesting that it is 
under-smoothed: not enough noise has been removed. 
The trend estimate obtained with the moving average 
of order 9 has removed the noise more successfully, 
without obscuring the detail. The moving average of 
order 9 is therefore preferable. 


(c) The trend declines until the end of 1993, then 
rises slowly until 1997. It then drops again until 2001, 
after which it rises. 








Solution 6.1 
(a) The weights are shown in Table S.1. 


Table S.1 Exponential weights 


Co C1 C2 C3 C4 


a=0.5 0.5 0.25 0.125 0.0625 0.03125 
a=08 08 0.16 0.032 0.0064 0.00128 


(b) The weight given to the current observation is co, 
which is equal to a. ‘Thus more weight is given to the 
current observation when a = 0.8 than when a = 0.5. 


Solution 6.2 
(a) Using expression (6.3), 
£30 = AX29 + (1 — a) T29 
~ 0.6 x 18.9 + (1 — 0.6) x 20.263 12 
~ 19.445 25, 
T31 = O30 + (1 — aœ) T30 
~ 0.6 x 17.8 + (1 — 0.6) x 19.445 25 
~ 18.458 10, 
£32 = Aya + (1—a)xa1 
~ 0.6 x 19.4+ (1 — 0.6) x 18.458 10 
~ 19.023 24. 
(b) The 1-step ahead forecast of the temperature on 
15 August, using data up to and including 14 August, 
is 739. Rounded to one decimal place, this is 19.0°C. 


The actual average temperature on 15 August was 
18.9°C. 





Solution 6.3 


(a) The optimal choice for a is the value which 
minimizes the SSE. The SSE is smallest for a = 0.05, 
so the optimal value is 0.05. The corresponding 
forecast for the 2005 precipitation is 944.5 mm. 


(b) The optimal value of a is very low. This means 
that little weight is placed on the most recent 
observations. 
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Book2 Time series 


Solution 7.1 


(a) The time series shows a very pronounced linear 
trend. The simple exponential smoothing method 
assumes there is no change in level, so it is not 
appropriate. 


(b) A suitable initial value for the level is the first 
value of the series: x; = 10.830 13. A suitable initial 
value for the slope is the difference between the first 
two values: Zo — xı = 10.844 58 — 10.830 13 = 0.01445. 
This is a better choice than 0 in this case because of 
the pronounced trend in the time series. 





(c) The optimal parameter combination is the one 
that minimizes the SSE. Hence the optimal 
combination is a = 1 and y = 0.3. The corresponding 
forecast is 11.9294, so the forecasted average house 
price (in pounds sterling) is exp(11.9294) ~ 151 700. 


(d) The forecast for August 2004 was too high, as it 
was based on an extrapolation of the trend. 
Subsequent forecasts gradually adjusted to the change 
in the trend. 


Solution 7.2 


(a) From Figure 7.8(b), the Holt—Winters forecasts 
track the seasonal variation more closely than the 
forecasts obtained using the other methods. ‘Therefore 
the SSE for the Holt—-Winters method is 799, the 
lowest of the three values. The SSE for Holt’s method 
is 3300, and that for simple exponential smoothing 

is 3387. The smallest SSF is obtained with the 
Holt—Winters method because this method allows for 
seasonality, which is very marked in this time series. 
Holt’s method is more flexible than simple exponential 
smoothing, in that it allows for a trend, so its SSE is 
(marginally) lower than that obtained using simple 
exponential smoothing. 


(b) The Holt—Winters method is more appropriate 
for these data than the other two methods, and it 
produces a lower SSE. Hence the forecast from this 
method, namely 4.8°C, is the most reliable. 


Solution 7.3 


(a) The forecast error for October 1987 is equal to 
the actual value minus the forecast for October 1987, 
that is, 1749.80 — 2406.94 = —657.14. In previous 
months, the forecast errors were much smaller. Thus 
the forecast for October 1987 is very inaccurate. 


(b) Using the Holt-Winters method might have 
improved the forecasts up to and including September 
1987. But it would not have had a major impact on 
the forecast for October 1987, because seasonality does 
not account for the big drop. 
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Solution 9.1 


(a) The time series lagged by 5 places is: *, x, x, *, *, 
5, 1, 4, —9, 3. 


(b) The sample autocorrelation r3 is calculated using 
the pairs (5, —9), (1,3), (4, -3), (—9, 7), (8,0), 

(—3, —1), (7,8). 

The sample autocorrelation rg is calculated using the 
pairs (5,7), (1,0), (4, —1), (—9, 8). 


(c) The autocorrelation at lag 0 is a correlation 
coefficient calculated using the pairs (x1, £1), (£2, £2), 

..,; (£n, £n). Since there is exact agreement between 
the two values in each pair (and hence the 
corresponding points on a scatterplot lie exactly on a 
straight line), the autocorrelation is 1. 





Solution 9.2 


(a) No lag stands out as having a particularly high 
(in absolute value) autocorrelation. There is no 
systematic pattern. A suitable one-sentence summary 
is as follows. The autocorrelations up to lag 20 are all 
close to zero, with no clear pattern. 





(b) If a clear pattern had emerged, or if some 
autocorrelations were far from zero, this might have 
suggested ways in which the forecasts might be 
improved. As it is, it is not clear how the forecasts 
could be improved upon. 


Solution 9.3 


(a) There is no change in the level of the time series 
of forecast errors. The size of the fluctuations does not 
appear to change, so there is no evidence that the 
variance of the forecast errors is changing 
systematically over time. (The slightly higher level at 
the beginning of the series is due to the choice of 
initial value and may be ignored.) 


(b) The significance bounds are 
+1.96/./239 ~ +0.127. 


(c) The sample autocorrelations at lags 16 and 18 
just cross the significance bounds. However, none of 
the sample autocorrelations clearly exceeds the 
bounds, hence there is little evidence of any non-zero 
population autocorrelations at lags 1 to 20. 








Solution 9.4 


(a) The test provides strong evidence of 
autocorrelation at lags 1 to 20. 


(b) The significance bounds are 
+1.96 / V252 ~ +0.123. 


(c) The sample autocorrelations at lags 3, 8 and 14 
only just cross the bounds. The only autocorrelation 
that clearly exceeds one of the bounds is at lag 1. 
Thus there is clear evidence of non-zero 
autocorrelation only at lag 1. 





Solutions to Activities 


Solution 9.5 


(a) The time plot in Figure 9.10(a) suggests that the 
forecast errors have roughly constant variance. There 
are two outliers corresponding to forecast errors 
greater than 1 in absolute value, but these do not 


invalidate the conclusion that the variance is constant. 


Both Figure 9.10(a) and Figure 9.10(b) show that the 
forecast errors are distributed roughly symmetrically 
around zero. The histogram in Figure 9.10(b) is 
unimodal and roughly symmetric; this suggests that 
the normality assumption is valid. Finally, the 
Ljung—Box test provides only weak evidence for any 
non-zero autocorrelations up to lag 20. Thus the 
assumptions on which the calculation of prediction 
intervals is based appear valid; that is, the time series 
of forecast errors may be assumed to be white noise. 





(b) An approximate 95% prediction interval is 
S 
(taii: tai) where 


Tni — = Tn41 =Z — 


. Da 
= 17.50 — 1.96 ~ 16.88, 
Ari BE = Tn4+1 +z = 
19.89 
= 17.50 + 1.964/ =o ~ 18.12. 


(c) The forecasted concentration at 396 hours, based 
on observed concentrations at two-hourly intervals up 
to 394 hours, is 17.5, with 95% prediction interval 
(16.9, 18.1). 


Solution 9.6 


(a) For a 99% prediction interval, the 0.995-quantile 
of N(0,1) is required, so z = 2.576. The approximate 
99% prediction limits, z ,, and 7;",,, are given by 


_ x SSE 
Trp = Tni = N —— 
[0.3737 
= 8.490 — 2.576 ~ 8.380, 
205 


Tayi — = Tn41 + z 2E 
ee 
= 8.490 + 2.576 a 8.000. 
ü 205 


(b) A forecast and prediction interval on the original 
scale may be obtained by applying the exponential 
function to these values. The forecast is 

exp(8.490) ~ 4866, 
and the 99% prediction limits are 

exp(8.380) ~ 4359, exp(8.600) ~ 5432. 
Thus the forecasted value of the FTSE100 index is 
4866, with approximate 99% prediction interval 
(4359, 5432). (The actual value for February 2005 was 
4968.50. This is close to the forecasted value, and lies 
within the prediction interval.) 








Solution 11.1 


(a) The time plot in Figure 11.3(a) displays no 
obvious trend or seasonality, and the variance seems 
roughly constant. Thus there is no reason to believe 
that the time series is not stationary. The time plot in 
Figure 11.3(b) displays marked seasonality, so this 
time series is not stationary in mean. Hence the time 
series is not stationary. 


(b) The time plot in Figure 11.3(b) was obtained 
from that shown in Figure 11.3(a) by adding a 
seasonal component. 





(c) The main difference between the two correlograms 
is the large periodic variation in Figure 11.3(d), which 
is not present in Figure 11.3(c). This periodic 
variation is induced by the seasonal component: a 
large positive value at time t will tend to be followed 
by large positive values at times t + 12 and t+ 24 (and 
also at times t + 36, t + 48, and so on) and lower 
values at times halfway through the seasonal cycle, 
such ast+6,t+18,.... This induces large positive 
autocorrelations at lags 12 and 24, and large negative 
autocorrelations at lags 6 and 18. 


Solution 11.2 


The first and second differences are shown in 
Table $.2. Note that there are two empty cells in the 
column of second differences. 


Table S.2 First and second differences 


First Second 

Period Index difference difference 
Quarter 1, 1990 89.3 — — 
Quarter 2, 1990 90.6 1.3 — 
Quarter 3, 1990 89.1 —1.5 —2.8 
Quarter 4,1990 88.3 —0.8 0.7 
Quarter 1, 1991 87.2 —1.1 —0.3 
Quarter 2,1991 86.2 —1.0 0.1 


The first differences are calculated as described in 
Example 11.4. The second differences are calculated 
from the first differences in the same way. For 
example, the second difference for the third quarter of 
1990 is 


23 = Y3 — Y2 
(—1.5) — 1.3 
= —2.8. 


Solution 11.3 


(a) From Figure 11.7, there is an initial increasing 
trend until the mid 1940s, then a drop, followed by a 
further increase. The size of the irregular fluctuations 
does not appear to change with the level. Taking first 
differences eliminates the trend in the original time 
series, as does taking second differences: both the first 
differences and the second differences are stationary in 
mean. 





(b) First-order differencing is sufficient to induce 
approximate stationarity in mean. 
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Book2 Time series 





(c) The time series represented in Figure 11.8(a) is 
stationary in mean and in variance. There is no reason 
to suppose that the autocorrelation structure varies 
with time, so it is reasonable to conclude that the 
series is stationary. 


Solution 12.1 


(a) The time plot does not display any trend over 
time, and you are told that it is non-seasonal. So it is 
stationary in mean. The size of the fluctuations does 
not vary systematically (the single big spike at t = 12 
does not amount to systematic variation). So the time 
series is stationary in variance. Hence there is no 
evidence to suggest that the series is not stationary. 





(b) There are two main features: the first two 
autocorrelations exceed the 95% significance bounds, 
and the autocorrelations tend to alternate in sign. 


(c) If an AR(1) model were appropriate, then 8 
would be negative, reflecting the alternating sign of 
the autocorrelations. 


Solution 12.2 


(a) This is not an autoregressive model: X; depends 
on Xı and Xə, but not directly on its immediate 
predecessors. 


(b) This is an autoregressive model of order 3, with 
parameters 3, = 0.6, 6, = —0.2, G, = —0.05. 


(c) This is not an autoregressive model, because it 
involves X? 4. 


(d) This is an autoregressive model of order 2, with 
parameters 3, = —0.6, 8, = 0.1. 


Solution 12.3 


The partial autocorrelation at lag 1 in Figure 12.8(a) 
is —0.4. The PACF is zero at lags greater than 1. So 
p= 1 and 6} = —0.4. 


In Figure 12.8(b), the partial autocorrelation at lag 2 
is —0.6, and the PACF is zero at higher lags. So p = 2 
and, since a, = b, for an AR(p) model, 3, = —0.6. 


Solution 12.4 


The sample partial autocorrelation at lag 1 exceeds 
the significance bounds. The sample partial 
autocorrelations at other lags are much smaller, and 
lie within or close to the significance bounds. Thus it 
is reasonable to conclude that the partial 
autocorrelations are zero at lags greater than 1. Thus 
an AR(1) model might be appropriate. 


Solution 13.1 


(a) There is no systematic variation in the level of the 
series, or in the size of the fluctuations. Hence there is 
no reason to suggest that the time series is not 
stationary. 
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(b) The correlogram shows a single large value at 
lag 1, and the remaining sample autocorrelations all 
lie within or only just cross the significance bounds. 
Furthermore, the partial correlogram shows an 
alternating pattern which is within the significance 
bounds for larger lags. So the suggestion that an 
MA(1) model is appropriate is reasonable. 


(c) The patterns in the sample ACF and the sample 
PACF correspond roughly to those shown in 

Figures 13.1(c) and 13.1(d). This suggests that the 
value of 0 is negative. 


Solution 13.2 


(a) This is not a moving average model, as it involves 
\/ Zt-2- 


(b) This is a moving average model of order 2. The 
autocorrelation at lag 2 is given by Formula (13.3): 
—O5 —0.2 


= — = = ~ — 0.155. 
1+02+62 140.5? +0.2? 


(c) This is not a moving average model, as the 
right-hand side includes X;_1. 


P2 


(d) This is a moving average model of order 1. The 
autocorrelation at lag 1 is given by Formula (13.1): 
—6; —({—0.9) 


= = ador. 
1+ a 1+ (—0.9)? 
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Solution 13.3 


(a) There is no systematic change in level, so the 
series is stationary in mean. There is no systematic 
change in the size of fluctuations, so the series is 
stationary in variance. Therefore there is no reason to 
suggest that the series is not stationary. 





(b) First, there is a single large (and negative) 
sample autocorrelation at lag 1, and thereafter the 
sample autocorrelations are much smaller, and mainly 
lie within the significance bounds. Secondly, the 
sample partial autocorrelations tail off in magnitude. 


(c) The order to choose is 1, since the sample 
autocorrelation at lag 1 is large in magnitude and all 
the others are close to zero. 


Solution 14.1 

(a) ARMA(2,1) 
(b) ARMA(0,1) 
(c) ARMA(3, 1) 


Solution 14.2 
(a) ARIMA(0, 2, 1) 
(b) ARIMA(2, 2,0) 


(c) Since the white noise model is ARMA(O, 0), the 
model for the original series is ARIMA(0, 2, 0). 


Solutions to Activities 


Solution 14.3 


(a) The time plot in Figure 14.4(a) indicates that the 
series is not stationary in mean. The first differences 
in Figure 14.4(b) are stationary both in mean and in 
variance. There is no reason to suspect that the first 
differences are not stationary. Thus further 
differencing is not necessary, so d = 1. 


(b) The sample autocorrelations are close to zero 
after lag 1, whereas the sample partial 
autocorrelations tail off with increasing lag. So a 
moving average model of order 1 is a plausible choice. 


The white noise is not very plausible since, at lag 1, 
both the sample autocorrelation and sample partial 
autocorrelation exceed the significance bounds. 
Autoregressive and mixed models are also not very 
plausible as the sample ACF does not gradually tail 
off to zero, but is close to zero after lag 1. 


(c) ARIMA(0, 1, 1). 


Solution 14.4 


(a) The sample ACF is not close to zero (for the first 
few lags). This rules out the white noise model. The 
sample ACF could be interpreted as tailing off to zero 
gradually. Since the sample PACF has two large 
values at lags 1 and 2, this might suggest an AR(2) 
model. However, another interpretation is that the 
partial autocorrelations tend to alternate in a damped 
sinusoidal pattern, so perhaps a mixed ARMA (p, q) 
model is appropriate. The sample ACF could also be 
interpreted as being zero after lag 4, in which case an 
MA(4) model would be appropriate. 


(b) For the AR(2) model, p +q = 2. The 

ARMA(p, q) model has p+ q parameters. For the 
MA(4) model, p+ q = 4. Thus, using the principle of 
parsimony, the two ‘best’ candidate models, both with 
p+q= 2, are the AR(2) and ARMA(1, 1) models. 


(c) The original time series is stationary, so d = 0. In 
ARIMA notation, the two models identified in part (b) 
are therefore ARIMA(2,0,0) and ARIMA(1, 0, 1). 





Solution 14.5 


Figure 14.9(a) shows that the 1-step ahead forecasts 
(the fitted values) closely match the observed values. 
However, to check the model adequacy in more detail, 
the forecast errors should be examined. 


Figure 14.9(b) shows that the variance of the forecast 
errors is constant, and that they are distributed 
around zero. Figure 14.9(c) shows that their 
distribution is plausibly normal. Finally, 

Figure 14.9(d) shows that the autocorrelations at 
lags 1 to 20 are all small and, with perhaps one 
exception, lie within the significance bounds. Since we 
might expect about 1 autocorrelation out of 20 to 
exceed the bounds by chance if the underlying 
autocorrelations are zero, the sample ACF does not 
suggest that the forecast errors are correlated. This is 
confirmed by the Ljung—Box test: the p value of 0.24 
provides little evidence against the null hypothesis 
that the autocorrelations at lags 1 to 20 are zero. 


In conclusion, the ARIMA(0, 1,1) model is adequate 
for these data. 
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Solutions to Exercises 


Solution 1.1 


(a) There is no seasonal component because the data 
represent annual averages. 


(b) There has been an increase of about 1°C (from 
about 9.3 to about 10.3) in the annual average 
temperature over the period. This is consistent with 
an increasing trend. 


(c) The irregular component is very marked: there is 
substantial variation from year to year. This makes it 
difficult to pick out a trend. 


Solution 1.2 


The pattern is similar for different years, suggesting 
that beer consumption varies seasonally. The seasonal 
plot indicates that beer consumption is highest in the 
fourth quarter (October to December) and lowest in 
the first quarter (January to March). 


Solution 2.1 


The seasonal variation does not appear to vary much 
in size with the level of the series. It is hard to tell 
from Figure 1.3 whether or not the irregular 
fluctuations vary in size with the level. Overall, there 
is little reason to reject the additive model. 


Solution 2.2 


(a) The time series of monthly sales shows marked 
seasonal fluctuations. The size of the seasonal 
variation increases with the level of the series. Hence 
an additive model is not suitable. 


(b) For the log transformed series in Figure 2.11(a), 
the seasonal fluctuations appear to be roughly of the 
same size, whatever the level of the series. For the 
time series of square roots, the seasonal variation 
increases with the level of the series. Thus the square 
root transformation does not produce a series which 
may be modelled by an additive model, whereas the 
log transformation might. 


Solution 4.1 


(a) Since seasonality may be ignored, only the 
irregular variation need be considered. ‘The 
transformation appears to have been successful: there 
is no suggestion that the fluctuations of the irregular 
component vary with the level of the series. 


(b) The time series in Figure 4.15(a) is the smoothest 
of the three, so it must have been obtained using the 
moving average with the highest order, that is, 

order 19. Figure 4.15(b) is the most spiky, so it was 
obtained using the moving average of order 3. 

Figure 4.15(c) was obtained using the moving average 
of order 11. 
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(c) This needs to be assessed in relation to the 
original series in Figure 4.14. In Figure 4.15(a), most 
of the detail has been smoothed out: perhaps this 
series is over-smoothed. In Figure 4.15(b), some 
short-term fluctuations remain, so perhaps this series 
is under-smoothed. This leaves the series in 

Figure 4.15(c) as the best compromise. But perhaps 
this one is also a little over-smoothed: the small 
bumps have been flattened out quite a lot. A moving 
average of order 7, for example, might be better than 
a moving average of order 11. The ‘best’ choice 
depends to some extent on whether features such as 
the ‘small bumps’ referred to here are important, and 
this is not a purely statistical question. 


Solution 4.2 


(a) The time plot of the original series is dominated 
by marked seasonal variation. The seasonally adjusted 
series suggests that there is some variation in the level 
over the period, but there is no clear upward or 
downward trend. 


(b) The seasonal factors were estimated as follows. 
First, the original series was smoothed using a 
weighted moving average of order 13 to remove the 
seasonal fluctuations. Then this smoothed series was 
subtracted from the original series. The monthly 
averages of the resulting series were calculated next. 
Finally, the mean of these averages was subtracted 
from each average. The resulting values are the 
estimated seasonal factors. 








Temperatures in Recife are highest between December 
and March, and lowest between June and August. The 
hottest month is February, and the coldest is August. 





(c) The only difference between the two time plots is 
in the vertical scales on which they are drawn. Thus 
the degree of smoothness is the same in the two plots. 
Any apparent differences in smoothness are due to the 
different scales used. This emphasizes the importance 
of using the same scale when comparing time plots. 
Figure 4.17(a) is drawn on the same scale as the 
original data; this is the better choice of scale. 


(d) There is substantial year-to-year variation, but no 
clear upward or downward trend. 


Solutions to Exercises 


Solution 6.1 
(a) The initial value is 7, = zı = 17.0, and the 
smoothing parameter a is 0.2, so, using (6.3), 
Tə = Qazı + (1 — a)Tı 
= 0.2 x 17.0 + (1 — 0.2) x 17.0 
A 
Similarly, 
T3 = azz + (1 — a)Tə 
= 0.2 x 16.6+ (1 — 0.2) x 17.0 
= 16.92, 
T4 = Qz3 + (1 — a)T3 
= 0.2 x 16.3 + (1 — 0.2) x 16.92 
16.796, 


T5 = Qz, + (1 — a)T4 
= 0.2 x 16.1 + (1 — 0.2) x 16.796 
= 16.6568. 


Thus the forecasted concentration at 10 hours (which 
corresponds to time point 5) is approximately 16.66. 


(b) The SSE is given by 


4 
XO (z, — 2)” = (17.0 — 17.0)? + (16.6 — 17.0)? 


t=1 + (16.3 — 16.92)? + (16.1 — 16.796)? 
= 0 + 0.16 + 0.3844 + 0.484 416 
= 1.028 816. 


Hence the SSE is 1.029 to three decimal places. 
Solution 6.2 


(a) The optimal value of a among the values listed in 
Table 6.6 is 0.3. ‘This is the value that gives the lowest 
SSE — 19.89 in this case. The corresponding forecast 
is 17.50. 


(b) The time series of forecasts will be less smooth 
with a = 0.8 than with a = 0.3, because the forecasts 
depend to a greater extent on recent observations for 
larger values of a. 





Solution 7.1 


The time series in Figure 7.11(a) has no clear linear 
trend or (visible) seasonality, so simple exponential 
smoothing is likely to be appropriate. 


The time series in Figure 7.11(b) has a marked linear 
trend, so simple exponential smoothing is definitely 
not appropriate. The fluctuations do not appear to 
vary in size with the level, so the series can be 
described using an additive model. Hence Holt’s 
exponential smoothing might be appropriate or, if 
there is seasonality (which is not clear), Holt-Winters 
exponential smoothing. 


The time series in Figure 7.11(c) has both an 
increasing linear trend and a clear seasonal cycle. 
Hence neither simple exponential smoothing nor Holt’s 
exponential smoothing are appropriate. ‘The seasonal 
and irregular fluctuations do not vary in size with the 
level, so an additive model can be used. The 
appropriate method is Holt—Winters exponential 
smoothing. 





The time series in Figure 7.11(d) has an increasing 
trend and seasonality. However, the seasonal 
fluctuations increase in size with the level, and hence 
an additive model is not appropriate. If a 
transformation can be found such that an additive 
model is appropriate for the transformed series, then 
the Holt—Winters method could be used. 


Solution 9.1 
(a) The significance bounds are 


+1.96//n = £1.96/V/109 ~ +0.188. 


The sample autocorrelation at lag 12 is —0.244, so it 
lies outside these bounds. This provides evidence 
against the null hypothesis that p,5 = 0. 


(b) The p value of 0.148 indicates that there is little 
evidence against the null hypothesis that the 
autocorrelations at lags 1 to 20 are zero. So there is 
little evidence that any of the autocorrelations 

P1, P2,- -© , Pog are non-zero. 


(c) The correlogram shows that a single sample 
autocorrelation lies outside the significance bounds. 
Under the null hypothesis that all autocorrelations at 
lags 1 to 20 are zero, about 5% of these 20 (that is, 1) 
might be expected to lie outside the bounds. This is 
what was observed, so the correlogram is consistent 
with the result of the test in part (b). 


However, in practice it would be worth investigating 
the autocorrelation at lag 12 a little more before 
dismissing it as a chance effect; because, for monthly 
data, lag 12 is rather special: it corresponds to the 
seasonal period. It could be that the Holt—Winters 
method is not capturing the seasonal variation 
adequately, and hence that the method could be 
improved. 











Solution 9.2 


(a) From Figure 9.12(a), the forecast errors appear to 
be distributed around zero with constant variance, and 
from Figure 9.12(b) they appear to be approximately 
normally distributed. So the assumption that the 
forecast errors are normally distributed with mean 
zero and constant variance is reasonable. 


(b) The 95% prediction limits for the logarithm of 
the average house price in February 2005 are given by 


z Po ISSE 
n 
/ 0.006 21 
= 11.937 — 1.96, / ————— 
109 


~ 11.922, 


= = Foyer s) 
n 
/ 0.006 21 
= 11.937 + 1.96, / ———— 
= 109 


~ 11.952. 
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Book2 Time series 


(c) On the original scale, the forecasted value (in 
pounds sterling) is exp(11.937) ~ 152 800. 


The 95% prediction limits are exp(11.922) ~ 150 500 
and exp(11.952) ~ 155 100. 


In summary, the forecasted average house price in 
February 2005, based on data from January 1996 to 
January 2005 and rounded to the nearest £100, is 
£152 800, with approximate 95% prediction interval 
(150 500, 155 100). 


In fact, the average house price for February 2005 was 
£152 879. So the forecast was quite accurate and 
certainly within the prediction interval. 





Solution 11.1 


The time series is (approximately) stationary in mean, 
because there is no systematic variation in the level of 
the time series, and you are told the time series is not 
seasonal. However, the time series is not stationary in 
variance, as the size of the irregular fluctuations 
suddenly increases shortly after time point 100. 


Solution 11.2 


The first differences are calculated as described in 
Example 11.4. The same method is then applied to the 
first differences to obtain the second differences. The 
first and second differences are shown in ‘Table 8.3. 


Table S.3 First and second differences 


First Second 
Time Value difference difference 
il 1790.8 — — 
2 1768.8 —22.0 — 
3 1742.5 —26.3 —4.3 
4 1802.2 59.7 86.0 
5 1784.4 —17.8 —77.5 
6 1857.6 To 91.0 


Solution 12.1 
(a) This model is autoregressive of order 2. 


(b) This model is not autoregressive, as it involves 
the term X;_1X};_9. 


(c) This model is autoregressive of order 1. 


Solution 12.2 


(a) The partial correlogram shows a large negative 
value at lag 1. All the other sample partial 
autocorrelations are close to zero, and lie within the 
significance bounds. Thus an AR(1) model would 
appear to be appropriate for these data. 


(b) For an AR(1) model, a; = 64. The sample partial 
autocorrelation at lag 1 is roughly —0.6, so 6, ~ —0.6. 
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Solution 13.1 


Time series 1: A moving average model of order 2 is 
appropriate, since the sample autocorrelations at 
lags 1 and 2 are the largest in magnitude and exceed 
the significance bounds, whereas all the others are 
close to zero and lie within the significance bounds. 
The sample PACF tails off to zero in magnitude with 
increasing lag. 

Time series 2: A moving average model of order 3 is 
appropriate, since the sample autocorrelations at 
lags 1, 2 and 3 exceed the significance bounds, whereas 
all the others are close to zero and lie within the 
significance bounds. The sample PACF tails off to 
zero in magnitude with increasing lag. 


Solution 14.1 


In an ARIMA(p, d, q) model, p is the order of the 
autoregressive component of the model, d is the order 
of differencing, and q is the order of the moving 
average component. Hence the models given can be 
described as follows. 


(a) ARIMA(2, 2,0) 
(b) ARIMA(1,0,2) 
(c) ARIMA(1, 1,1) 
(d) ARIMA(0, 1,1) 
Solution 14.2 


(a) After lag 2, the sample partial autocorrelations lie 
close to zero and do not exceed the significance 
bounds, so an autoregressive model with p = 2 might 
be appropriate. 


(b) After lag 3, the sample autocorrelations lie close 
to zero and do not exceed the significance bounds, so a 
moving average model with g = 3 might be 
appropriate. 


(c) An alternative interpretation of the correlogram 
and the partial correlogram is that both the sample 
ACF and the PACF tail off to zero with increasing lag. 
Therefore an ARMA(1,1) model might be appropriate. 


(d) The value of p+ q is 2 for an AR(2) model, 3 for 
an MA(3) model, and 2 for an ARMA(1, 1) model. 
According to the principle of parsimony, the shortlist 
of models should include the AR(2) model and the 
ARMA(1,1) model. 


Solution 14.3 


The histogram suggests that the forecast errors may 
well be normally distributed. However, the 
correlogram suggests that the forecast errors are not 
uncorrelated: there is a large positive autocorrelation 
at lag 2. This is confirmed by the Ljung—Box test: the 
p value of 0.01 provides strong evidence against the 
null hypothesis that the autocorrelations at lags 1 

to 20 are zero. Therefore it can be concluded that an 
MA(1) model is not adequate. 


Solutions to Exercises 


Solution 16.1 


This exercise covers some of the ideas and techniques 
discussed in Sections 1 and 2. 


(a) The time series is cyclical. There does not appear 
to be a trend: the level of the series fluctuates around 
a mean value that does not change over time. The 
irregular fluctuations appear to be larger at the tops of 
the cyclical peaks than in the troughs. Over the 
50-year period spanned by these data, there were four 
complete cycles and most of a fifth cycle. So the 
period T is a little over ten years. 





(b) An additive model would not be appropriate, 
because the size of the irregular fluctuations is greater 
at the tops of the peaks than in the troughs. 





(c) The log transformation is not appropriate because 
the size of the irregular fluctuations in Figure 16.2(a) 
is greater in the troughs than at the peaks. The 
square root transformation may be appropriate: the 
size of the irregular fluctuations in Figure 16.2(b) 
seems to be similar at all points of the cycle. 


Solution 16.2 


This exercise covers some of the ideas and techniques 
discussed in Sections 1 and 4. 


(a) The main feature of the plot is the strong 
seasonality, which dominates the plot. No trend is 
discernible, because of the strong seasonality. The 
maximum temperatures in cold months appear to be 
more variable than the maximum temperatures in 
warm months, since the lower edge of the plot is more 
ragged than the upper edge. 


(b) The seasonal factors are estimated as follows. 
First, the time series is smoothed using the following 
weighted moving average: 


SA(t) = (05X16 + X-5 +- + XY 
TAEST AFLE E 0.5X:+6) . 
The values obtained are subtracted from the values in 
the original series, and the raw seasonal factors F} are 


obtained by calculating the monthly averages of these 
differences. 





Next, the average F of the F; is calculated. 

Then the seasonal factors are estimated as 
Jare. 

The estimated seasonal factors show that the 


maximum monthly temperatures are highest in July 
and lowest in January. 


(c) The higher the order of the moving average that 
is used, the greater is the smoothing produced. Thus 
Figure 16.4(a) corresponds to the seasonally adjusted 
series, Figure 16.4(b) to the moving average of 

order 51, Figure 16.4(c) to the moving average of 
order 11, and Figure 16.4(d) to the moving average of 
order 121. 


(d) The moving average of order 11 results in a plot 
which appears under-smoothed: there is still much 
irregular fluctuation. The moving average of order 121 
produces a plot which is perhaps a little 
over-smoothed. So the moving average of order 51 is a 
reasonable compromise. However, note that this choice 
is to a large extent subjective: for example, it would 
be quite reasonable to choose Figure 16.4(d) if you 
believed that the irregularities in Figure 16.4(b) were 
due to noise. 


(e) The level appears broadly constant until about 
1970. During the 1970s there appears to be some 
fluctuation. After 1980 the level is again roughly 
constant, though perhaps slightly higher than before 
1970. 


Solution 16.3 


This exercise covers some of the ideas and techniques 
discussed in Section 6. 


(a) The time plot shows an increasing trend between 
1988 and 2000. Between 2000 and 2005, the time 
series does not show a clear trend. ‘The irregular 
fluctuations appear not to vary in size as the level of 
the series changes. Hence an additive model is 
appropriate for this time series. 


(b) Since there is no seasonal variation (as stated in 
the question), the Holt-Winters method is not needed. 
Since the data from 2000 are to be used, and there is 
no clear trend during that period, simple exponential 
smoothing is probably the best method to use. 


(c) The starting value is the first value of the series, 
namely zı = log(10 940.53) ~ 9.3002. 


(d) The optimal value of the parameter a is the value 
that minimizes the SSE, so a = 0.9. This is a high 
value, indicating that forecasts depend largely on the 
most recent observations. 


(e) The 1-step ahead forecast of the logarithm of the 
Dow Jones index for July 2005 is given by 
Lily 9005 =O Zjune 20058 + (1. — @) 1 June 200: 
= 0.9 x 9.2375 + (1 — 0.9) x 9.2537 
oO JOU. 


A forecast for the index on the original scale is 
obtained by applying the exponential function to this 
value. Thus the forecasted value of the Dow Jones 
index for July 2005 is exp(9.2391) ~ 10 292. 
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Book2 Time series 


Solution 16.4 


This exercise covers some of the ideas and techniques 
discussed in Sections 7 and 9. 


(a) The time plot suggests that the forecast errors are 
distributed with mean zero and constant variance. 

(A possible exception is at the start of the series, 
where the errors tend to be negative. This is due to 
the choice of starting values, and may be ignored.) 
The histogram suggests that the distribution is 
normal. ‘Thus it is reasonable to assume that the 
forecast errors are distributed normally with mean 
zero and constant variance. 


(b) The 95% significance bounds are 
+1.96/./n = £1.96//210 ~ +0.135. 


None of the sample correlations at lags 1 to 20 exceeds 
the significance bounds, so there is little evidence to 
suggest that the underlying autocorrelations are 
non-zero. 





A time series is white noise if it is normal with mean 
zero and constant variance, and all autocorrelations at 
lags k > 1 are zero. From the results above and those 
obtained in part (a), there is no compelling reason to 
believe that this is not the case. 


(c) The 95% prediction limits for the July 2005 value 
of the log Dow Jones index are given by 








E 
y =r- 1.96 aL 
n 
/ 0.3925 
= 9940/7 = 1, ~ 9.1 
9.2407 96 510 9.1560, 
= SSE 
et = T + 1.964/ —— 
n 
0.3925 
= 0 94 1. ~ 9.3254. 
9.2407 + 1.96 510 9.325 


The forecast for the Dow Jones index is obtained by 
applying the exponential function to this forecast and 
to these prediction limits: 


exp(9.2407) ~ 10 308, 
exp(9.1560) ~ 9471, 
exp(9.3254) ~ 11219. 


Thus the forecast for the July 2005 value of the Dow 
Jones index is 10308, with 95% prediction interval 
(9471, 11 219). 


Solution 16.5 


This exercise covers some of the ideas and techniques 
discussed in Sections 11 to 14. 


(a) There is no systematic change in the level of the 
time series (and you are told that the series is not 
seasonal), so the time series is stationary in mean. 
However, the magnitude of the irregular fluctuations 
increases over time, so the series is not stationary in 
variance. Hence the series is not stationary. 
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(b) The time plot in Figure 16.5 shows that the time 
series of logarithms of the Dow Jones index is not 
stationary in mean. The time plots in Figure 16.9 
suggest that both the first and the second differences 
are stationary in mean and in variance. Hence neither 
offers any evidence of non-stationarity. The order of 
differencing is the smallest number of differences 
required to obtain a stationary series, so d = 1. 





(c) None of the sample autocorrelations and sample 
partial autocorrelations at lags 1 to 20 exceeds the 
95% significance bounds. Thus there is little evidence 
that the underlying autocorrelations and partial 
autocorrelations at lags 1 to 20 are non-zero. 





(d) From part (c), a plausible model for the time 
series of first differences is the white noise model. 
Since d = 1, the model for the logarithms of the Dow 
Jones index is ARIMA(0, 1,0). 


Solution 16.6 


This exercise covers some of the ideas and techniques 
discussed in Sections 11 to 14. 


(a) The sample autocorrelation function declines 
gradually, whereas the sample partial autocorrelation 
function appears to drop close to zero after lag 2. This 
would suggest that an AR(2) model is appropriate. 
Since no differencing is involved, this corresponds to 
an ARIMA(2, 0,0) model. 


Other interpretations are possible, though perhaps 
they are less plausible. For example, the sample 
partial autocorrelations could be deemed to decline 
gradually as well as the sample autocorrelations. This 
suggests an ARIMA(p,0,q) model with both p and q 
greater than zero. Applying the principle of parsimony 
then leads to the ARIMA(1, 0,1) model, for which 
p+q= 2, the same as for the ARIMA(2, 0,0) model. 


(b) The correlogram cuts off abruptly after lag 1, 
while the sample partial autocorrelations decline 
gradually in absolute value. This is typical of an 
MA(1) model. Since the series has been differenced 
once, d= 1. Thus an ARIMA(0, 1,1) model is 
reasonable. 





(c) The model formula for the first differences Y; is 
Y; — 0.007 = Z; — 0.865241, 
where Z; is white noise with standard deviation 4.042. 


(d) The p value of 0.003 for the ARIMA(0O, 1, 1) 
model provides strong evidence against the null 
hypothesis that the forecast errors are uncorrelated at 
lags 1 to 20. Thus the ARIMA(0, 1,1) is not adequate. 


The p value of 0.541 for the ARIMA(2, 0,0) model 
provides little evidence against the null hypothesis of 
zero correlation of the forecast errors at lags 1 to 20. 
To decide whether the ARIMA(2, 0,0) model is 
adequate, it is necessary to check that the distribution 
of the forecast errors is normal with mean zero and 
constant variance. This can be done by examining the 
time plot and a histogram of the forecast errors. 
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